Mathematical aspects of division property

This work surveys mathematical aspects of division property, which is a state-of-the-art technique in cryptanalysis of symmetric-key algorithms, such as authenticated encryption, block ciphers and stream ciphers. It aims to find integral distinguishers and cube attacks, which exploit weaknesses in the algebraic normal forms of the output coordinates of the involved vectorial Boolean functions. Division property can also be used to provide arguments for security of primitives against these attacks. The focus of this work is a formal presentation of the theory behind the division property, including rigorous proofs, which were often omitted in the existing literature. This survey covers the two major variants of division property, namely conventional and perfect division property. In addition, we explore relationships of the technique with classic degree bounds.


Introduction
In this paper we discuss the mathematical aspects of a modern technique in symmetric cryptography. This method allows to both find better attacks as well as to give stronger arguments for the security of given schemes. A large part of the research in the area of division property is devoted to making the actual computation more efficient and applicable to a larger set of primitives. In a nutshell, this part involves setting up a suitable set of equations and inequalities and ask a modern SAT or MILP solver to find solutions. While very important for the field, this aspect is not in the scope of this survey. Instead we focus on the mathematical aspects of division property. But before doing so, we start by giving a bit of context and motivation.

Symmetric cryptography
Symmetric primitives play an important role in our daily communication and protect almost all sensitive communication. The security of a symmetric primitive, be it a block cipher, a stream cipher or a permutation-based construction, can not be proven as such by current techniques. Instead, arguments for its security are always arguments why a specific attack or a class of attacks is not applicable to the given scheme. One important class of attacks are these which exploit properties of the algebraic normal form of the cipher in question. In its general form, a symmetric primitive can always be thought of as a function F : F n 2 × F 2 → F m 2 that takes two bit-strings x and k of lengths n and as inputs and produces an m-bit output.
The input x can be thought of as public information. It could be the message in case of a block cipher, a message and a tweak in case of a tweakable block cipher, or the initial value (IV) in case of a stream cipher. The input k is the secret key which is, as well as (an implementation of) the function F , shared between sender and receiver. The output F (x, k) can again be thought of as public and would correspond to the cipher-text in case of a block cipher, the key-stream in case of a stream cipher. It could also be an authentication tag in case of a message authentication code (MAC), or serve as both the cipher-text together with an authentication tag in case of authenticated encryption. The fact that both the input x and the output F (x, k) are thought of as public information might seem unnatural as making both the message and the cipher text public seems to nullify the whole point of encryption. However, it is common to take a very powerful attacker into consideration that actually might have access to many, even chosen, plain-text and cipher-text pairs.

Attacks based on the ANF
As any Boolean function, each coordinate F t : F n 2 × F 2 → F 2 of F , t ∈ [1, m], can be expressed by its algebraic normal form (ANF). That is, we can represent F t uniquely as a multivariate polynomial.
There are many attacks based on specific properties of the ANF of F t (for any coordinate t), some of them differ in details or targets, but are all based on exploitable knowledge about this ANF. One of the first attacks is based on F t having small degree in the variables x. Indeed, if F t has degree d, any its (d + 1)-th derivative is the constant zero function. Those attacks were coined high-order differential attacks in [1]. Integral attacks [2], make use of a similar but more fine-grained property. Initially, for integral attacks one first fixes part of the input x and then exploits a small degree of the resulting function. In terms of higher-order differentials, those can be seen as a special case where only certain (d + 1)-th derivatives vanish. However, in their most general form, integral attacks would not correspond to derivatives anymore but to a key-independent sum of cipher-texts for a well chosen subset of all plain-texts.
Cube attacks [3], instead, focus on derivatives that are simple but have not vanished yet. Here, simplicity means low algebraic degree and/or dependence only on few secret variables. Such functions can be used to recover partial information about the secret key. Cube attacks are thus always key recovery attacks, while integral properties by themselves only define a distinguisher of the primitive from a random one. While integral distinguishers can be used to mount key recovery attacks for some primitives such as block ciphers, it is not possible, for example, for stream ciphers.

Division property
One major obstacle of the attacks above is to actually find out that, e.g., the degree of a cipher is small, or, more generally, to find derivatives that vanish or have sparse ANF. The function F is naturally very complex as a whole. Almost all symmetric ciphers are composed of simple and almost identical functions F (i) . Those functions F (i) , called round functions, are in particular very efficient to implement and often have very simple ANFs and very low degree. Division property, in all its variants described below, is now a tool that allows to deduce information about the ANFs of F by analyzing its structure as a composition of those simple functions F (i) . In its initial form, it would allow to derive better upper bounds on the degree of F (defined as the maximum degree of all its coordinates), i.e., potentially find better attacks. Later, variants were developed that allow the efficient computation of single entries in the ANF. We will discuss the mathematical aspects of the most important variants in this paper.

Brief overview of the division property variants
Yosuke Todo [4] introduced the division property as a new technique to find better integral attacks. This initial version is referred to as the conventional division property. Here, the state of a cipher is grouped into words (e.g. bytes) and the division property allows to make certain statements about the ANF of the cipher. More precisely, the conventional division property allows to determine upper bounds on the degree of the cipher by keeping track of bounds on the degree of a certain set indicator in the separate words and how those change for the basic operations that are applied in the round functions. This led to new attacks, most prominently against the block cipher MISTY1 [5,6].
While the grouping into words allows rather efficient calculations, its information was limited and therefore Todo and Morii [7] later introduced the bit based division property as a refinement.
Both variants allow to compute a set of certain monomials (and their multiples) that do not occur in the ANF of the function F t (·, k) for any key k. However, for monomials which are not contained in the set, no information is known. One of the main advantages of such a representation is that it allows to cope with the often seen key-addition of an unknown round key to the state.
In this sense the bit based division property splits the space of monomials into two distinct parts, one for which the coefficients are zero, and one for which the coefficients remain unknown.
In a next step, Todo and Morii, in the same paper extended the framework to be able to make statements about monomials of F t (·, k) which are always present independently of k (i.e., the respective coefficient is the constant 1). This is referred to as the three-subset division property to highlight that now all monomials are split into three sets: one for which the coefficients in the ANF of f are known to be zero, one for which they are known to be one, and the rest for which nothing can be concluded.
In order to allow the computation of even more elements of the ANF, Hao et al. [8,9] introduced the three-subset division property without unknown subset. This paper shifted the view from F t (·, k) to F t (·, ·), i.e., the key is treated as a usual variable. They give algorithms, based on division trails introduced in [10], that allow to compute (a limited number of) ANF coefficients of F t exactly. At least theoretically, this removes the set of monomials for which no information can be computed, hence the name.
It should be noted that the three-subset division property without unknown subset actually separates all monomials only in two sets, more precisely in one set and its complement. This concept, without the important aspect of computational hardness, was already treated by Boura and Canteaut [11] using the term parity set.

Brief comparison of division property variants and classic degree bounds
Division property can be seen as evolution and specification of classic methods of bounding the algebraic degree of iterated functions. The most powerful of the classic ones are the naive bound [12] (extensively used in cryptanalysis, see e.g. [13]) and a more recent family of bounds based on the degrees of the involved graph indicators by Carlet [14], in particular, where is such that if and only if F (x) = y. These bounds are not always tight and thus clearly can not provide better distinguishers than the perfect division property (of course, the downside of the latter is that it is not always feasible to compute). The relationships of these bounds to the conventional division property are described in Section 5. Notably, the state-based conventional division property with ideal propagation can be viewed as a slight generalization of the Carlet's graph indicatorbased method (requiring more information about the graph indicators and yielding stronger bounds). Table 1 summarizes the hierarchy of these classic methods and division property in terms of precision and required information.
In the table, it is assumed that the same decomposition F = F (r) • . . . • F (1) of the target function F is used, and the "required information" used by the technique is given precisely for those functions F (i) . The table does not include all classic degree bounds (for example, bounds based on the divisibility of the Walsh spectra of the involved functions, or requiring more specific information). The described classic degree bounds are not comparable (i.e., for each bound, there exist instances of functions where this bound is strictly better than the others) and so are grouped together. A relevant detailed study of degree bounds can be found in [14]. Furthermore, relationship of division property variants and the Boura-Canteaut bound was studied in [15].
The argumentation for the provided hierarchy is as follows: (A)⇒(B): Follows from the relation of conventional division property to maximal monomials in the graph indicators of composed functions (proven in [16], see Section 5). (B)⇒(C)⇒(D): Follows from inclusion of the conventional division property variants by the partial weights computations (see Section 4). In the top-to-bottom order, the precision increases (each next technique can find more distinguishers when feasible) at the cost of more information required about the involved functions (D)⇒(E)⇒(F): Follows from the inclusion of bit-based division property variants: three-subset division property is simply a mix of the conventional and the perfect division property (see [15,17]).

Outline
The parity sets and the perfect division property is described in Section 3. It is foundational for a formal description of the original (conventional) division property, which we provide in Section 4. As mentioned above, historically, conventional division property was described and applied earlier than the perfect division property. In Section 5.2, we provide a comparison of (conventional) division property with known generic degree bounds. Section 6 describes the application of perfect division property for providing security arguments of symmetric-key primitives. Finally, Section 7 concludes the work.

General notations
The integer segment (i, i + 1, . . . , j) is denoted by [i, j ]. The all-one and all-zero vectors are denoted by 1 and 0 respectively. The i-th unit vector is the vector having 1 at the i-th position and 0 otherwise, it is denoted by e i .

Boolean vectors and functions
An n-bit Boolean function is a function f : F n 2 → F 2 , where F 2 = {0, 1} is the binary field and F n 2 is the n-dimensional vector space over F 2 . The support of a Boolean function f is the set of all preimages of 1 under f . An n × m vectorial Boolean function is a function F : F n 2 → F m 2 . The graph of F is denoted by F and is given by The bitwise logical AND and OR are denoted by ∧: F n 2 ×F n 2 → F n 2 and ∨: F n 2 × F n 2 → F n 2 respectively. The logical negation ¬ is defined by ¬: For x, u∈ F n 2 , the notation x u is a shorthand for n i=1 x u i i := i∈ [1, n], u i =1 x i . The indicator of a set X ⊆ F n 2 is defined as such that if and only if x ∈ X. By an abuse of notation, we will identify a set X ⊆ F n 2 with its indicator In particular, the degree of a set X is defined as the algebraic degree of its indicator, i.e., deg Conversely, we will identify a Boolean function The indicator of the graph of a vectorial Boolean function F will be called shortly the graph indicator of F .
The symmetric difference, denoted by , is a binary operation on sets equal to the set of elements each present in exactly one of the two input sets. It is commutative and associative, so that a symmetric difference of a collection of sets is well defined, and is equal to the set of elements each present in an odd number of the input sets.
We adapt the set-builder notation and applications of functions by defining the resulting set to be constructed from elements having odd multiplicity (i.e., ignoring elements with even multiplicity), for example: • for a function F : F n 2 → F m 2 , the notation {F (x) | x ∈ F n 2 } defines the set of all vectors y in the image of F having an odd number of preimages; • for F : F n 2 → F m 2 , X ⊆ F n 2 , the notation F (X) defines the set of all y ∈ F m 2 such that there exists an odd number of x ∈ X such that y = F (x).
An exception is the partial weights vector function wt k 1 ,...,k r (used in Section 4 and Section 5), which is explicitly defined to act on sets as the union of its applications to the individual elements.
Every Boolean function f : F n 2 → F 2 admits a unique representation as a multivariate polynomial over F 2 and n variables: where λ u ∈ F 2 are constant coefficients independent of x. This representation is called the algebraic normal form (ANF). We will say that a monomial x u belongs to the ANF of f (or simply "belongs to f (x)") if λ u = 1. The ANF support (the set of all u ∈ F n 2 such that x u is present in the ANF) of a Boolean function f : F n 2 → F 2 or of a subset X of F n 2 is denoted by A(f ) and respectively. Fact 1 (See, e.g., [18]). For any f : F n 2 → F 2 and any u ∈ F n 2 , the corresponding ANF coefficient λ u is given by

Partial order
We use the product order ( ) on Z n (including F n 2 by , and x ≺ y if and only if x y and x = y.
The case of x u is analogous.
In the following, we assume that a universe set ⊆ Z n is given by the context and it is of the shape where k 1 , . . . , k n ∈ Z >0 . The logical negation is naturally generalized to operate on [0k] (where k ∈ Z >0 ): Definition 1 (Lower/upper sets/closures). The lower closure of a set X ⊆ , denoted by ↓ X, is the set of all u ∈ with u x for some x ∈ X: The upper closure of a set X ⊆ , denoted by ↑ X, is the set of all u ∈ with x u for some x ∈ X: A set X is an upper set if its upper closure is X itself. A set X is a lower set if its lower closure is X itself.
See Fig. 1 for an illustration.

Definition 2 (Convexity).
A subset X ⊆ Z n is called convex, if for any a, b, c ∈ Z n , a b c and a, c ∈ X imply b ∈ X.
The min-set of a set X ⊆ Z n , denoted by Min(X), is the set of all minimal elements in X:

Definition 4 (Antichain).
A set X ⊆ Z n is called an antichain if all pairs of distinct elements from X are incomparable.

Fact 4. A set X ⊆ Z n is a max-set (or a min-set) of some set if and only if it is an antichain.
Proof If X is a max-set (or a min-set), then it can not contain two elements u, v such that u v (otherwise, one of them must be removed from X), and so is an antichain. Conversely, if X is an antichain, then it is the max-set of ↓ X and the min-set of ↑ X.

Symmetric-key primitives
Symmetric cryptography plays an important role in securing our daily sensitive data. Due to their great performance advantages compared to their public-key counterparts, symmetrickey primitives are responsible for the encryption and authentication for any data that is protected.
The most important basic building blocks of symmetric cryptography are block ciphers, stream ciphers and cryptographic permutations.
A block cipher is a mapping mapping an n-bit message x and an bit key k to an n bit cipher-text such that for each fixed key k ∈ F 2 the restriction is a permutation on F n 2 . Common sizes for n, called the block size, are 64 and 128 and , the key-size, range from 80 to 256.
A stream cipher most commonly is based on a key stream generator that takes as inputs an initial value IV, a key k, and produces a (very long) stream of bits The actual encryption of a message is then performed by simply adding (in F 2 ) the message with the output of the key-stream generator. Those primitives are used in modes of operation to ensure secure encryption, authentication or a combined mode for authenticated encryption. The security of those modes can usually be proven under some assumption on the underlying primitive. One standard assumption is that it should be practically impossible to distinguish the primitive using an unknown key from a random function, or, in case of a block cipher, from a random permutation. The security of the primitive itself cannot be proven but is rather checked against known classes of attacks. The most well known statistical attacks are differential and linear cryptanalysis and their derivatives. But other large classes exist and are applicable to (reduced) versions of many ciphers. Some of the most important attacks are those that investigate the ANF of a cipher. They are described in the following parts.
Virtually all primitives deployed in practice are iterative designs. For the case of a block cipher, this means that a simple to analyze and easy and efficient to implement function is iterated a fixed, well chosen, number of times. Often, the key, or rather a round-specific part of the key, is simply added to the current state between the rounds. This constitutes what is known as a key-alternating cipher and is depicted 1 below.
Most of the symmetric primitives we are using today are not only iterative designs but also use what is known as an SP-Network as round function. In the substitution part, a non-linear layer of parallel small-sized permutations (the so-called S-Boxes) is used. The permutation part on the other hand consists of a (binary) linear layer applied to the full state. Using those ingredients when designing efficient and secure ciphers or cryptographic permutations has a long-standing history. It can be seen as having its roots already in Shannon's seminal ideas on confusion and diffusion [20]. While certainly many alternative design strategies exist, the use of S-boxes and linear layers is arguably dominating today's designs and include AES, SHA-3, and many of the primitives for the final round of the NIST lightweight cryptography competition.
Another design strategy we want to briefly mention are Feistel ciphers, with DES being the most prominent but by now out-dated example. Here the message is first split into two halves. The right half is input to a round function f i , that also takes the key as an input. The output of f i is added to the left half and finally both parts are swapped. The function f i are often again based on linear-layers and S-boxes, just as described above for block ciphers. Three rounds of a Feistel cipher are shown below.

Integral cryptanalysis
Integral attacks [2] and cube attacks [3] are two important attack vectors on symmetric primitives, which exploit certain properties of the involved ANF expressions. Integral attacks distinguishing the analyzed primitive from an ideal one can be applied both to block ciphers and stream ciphers. However, such distinguishers can be extended into key recovery attacks only for block ciphers. Cube attacks, on the other hand, can be used to mount key recovery in both block ciphers and stream ciphers.
To mount a distinguishing integral attack on a block cipher E : F n 2 × F 2 → F n 2 , an adversary chooses a proper non-empty subset S ⊂ F n 2 and a non-zero output mask β ∈ F n 2 , so that the sum is constant zero or one, ı.e., is independent of the key k. This property can be used as a distinguisher since for a random permutation F : F n 2 → F n 2 , the sum x∈S β, F (x) is zero or one with a probability 1/2 each (excluding the trivial cases S = ∅ and S = F n 2 ).
An integral distinguisher can be extended into a key recovery attack in the following way. Let E k = G k • F k , where G k , F k : F n 2 → F n 2 are bijective key-dependent parts of the cipher, and let F k have an integral distinguisher x∈S β, F k (x) = 0 for all keys k. Then, it is also true that However, under certain assumptions, for a key k = k, the equation would hold only with probability 1/2, providing a way to distinguish a wrong key k from the right key k. In particular, if β, G −1 k depends only on part of the key, the set of candidate values for this part can be reduced by checking the (1). This method can be improved by advanced techniques such as partial sums [21] or FFT-based key recovery [22].

Finding integral distinguishers with division property
Division property is a technique for finding integral distinguishers based on degeneracies of ANF expressions of the analyzed ciphers.
Conventional division property (Section 4) may only exhibit a monomial such that all its monomial multiples are missing in the ANF. In addition, the technique is imperfect, meaning that it does not guarantee finding a distinguisher even if it exists; nonexistence of a distinguisher can never be proven either. Roughly speaking, conventional division property does not detect cancellations of monomials in intermediate operations, which may or may not lead to distinguishers; the technique only detects cases when the given monomial can not be computed in principle, due to nonexistence of multiplication paths (the so-called "division trails", see Section 3.2). However, application of conventional division property is feasible for nearly all used ciphers in the literature and often yields powerful distinguishers.
Perfect variants of division property (Section 3), on the other hand, can be used to compute a chosen ANF coefficient exactly. Simply speaking, the technique counts the number of multiplication paths (division trails) for the target monomial, and the parity of the count determines the respective ANF coefficient. In practice, however, it is feasible only in some use-cases (e.g., 64-bit block ciphers) and often requires specific optimizations and fine-tuning of utilized solvers or optimization software.

Cube-based cryptanalysis
For cube attacks, we define a Boolean function F : F n 2 × F 2 → F 2 , where the first input is public and the second input is the secret key. In the context of stream ciphers, the public input is the initialization vector and F computes a bit of the key-stream. Now, the public input is split into two parts and F decomposed as where p, r are polynomials, (x, y) ∈ F n 2 , and x 1 does not occur in the ANF of r. Recall that x 1 corresponds to the product of all variables in x. The polynomial p is called the superpoly. It holds that x F (x, y, k) = p(y, k), since r vanishes. When now for a fixed y the polynomial p(y, k) has a low complexity in k, the adversary obtains information about the key or can use p to filter keys.

Finding cube attacks with division property
Division property is a state-of-the-art tool for finding cube attacks. Its feasibility range is much larger than classic methods based on empirical evaluation of cubes (as in [3]).
Conventional division property may be used to search for potential cube attacks. More precisely, it allows to upper-bound the shape of monomials in the superpoly p(y, k). However, it can not show that p(y, ·) is non-trivial, i.e., that it may actually be used to recover information about the key k (otherwise, the cube key recovery attack degrades to an integral distinguisher; this actually happened in the literature [23]).
Perfect division property can recover the exact coefficients of the superpoly and guarantee a successful key recovery attack. However, its application is much more computationally intensive and is often not feasible.

Parity sets and perfect division property
Parity sets were introduced by Boura and Canteaut [11] to formalize rigorously Todo's original (conventional) division property [4], which can be viewed as a union of vectorial lower bounds on the parity set of a given set. Considering propagation of parity sets directly leads to the perfect division property. While an intermediate variant called 3-subset division property was proposed by Todo and Moriai already in [7], fully perfect variants were proposed and shown feasible in practice only a few years later [8,17,24].
In this section, we introduce parity sets and study their properties, focusing on propagation of parity sets through functions. These constitute the key concepts behind the perfect division property and also prepare formalisation of the conventional division property (Section 4).

Parity sets
By an abuse of notation, the parity set of a Boolean function is defined as the parity set of its support.

Remark 1.
In the division property literature, parity sets correspond to the "3-subset division property without the unknown subset". More precisely, the 3-subset division property defined in [7] included a set L \ K being a subset of U (X) feasible to maintain. Later, [8] defined a setL exactly as the parity set of X and called the variant "3-subset division property without the unknown subset".
The parity set is equal up to negations to the support of the ANF of the set's indicator. Proposition 1. [16] The following statements are equivalent characterizations of the parity set: Proof The first equivalence is trivial; the second equivalence follows from Fact 2; the third equivalence is proven by where the latter condition clearly defines the coefficient of x ¬u in the ANF of (Fact 1).
The second characterization, although trivial, shows an interesting formulation of the parity set as describing monomials x u that, when multiplied with the (indicator) function, produce the monomial x 1 .
The third characterization shows a close similarity to the Möbius transform-based formulation of the ANF coefficients. The only difference is the direction of the summation: the parity set coefficients are sums of the function over supermasks, while the ANF coefficients are sums of the function over submasks.
The fourth characterization uncovers an equivalence up to negations of the ANF support and of the parity set. Corollary 1. [11] The mapping U is involutive.
Proof Follows from the fact the A is involutive.

Proposition 2. The parity set operator acts linearly on the set indicator. That is, for an integer k ≥ 1 and for all sets
Proof Let X, X 1 , . . . , X k ∈ F n 2 be such that Then, for any u ∈ F n 2 , The converse follows from the fact that U is involutive. The equivalence to the symmetric difference follows from the definition and the fact that the field has characteristic 2.
Finally, we list a few common examples of parity sets.
Proof 1. Since (as a function), its ANF is empty and so must be the parity set (by Proposition 1). 2. Follows from the definition of a parity set, the product x u equals to 1 if and only if u u (the necessary and sufficient condition is to have u i = 0 imply u i = 0 for all i). 3. Follows from point 2 and the fact that U is an involution. 4. Is a special case of point 3 with u = 1 and ↓ u = F n 2 .
The typical use of parity sets and perfect division property in cryptanalysis reduces to computing an ANF coefficient of a given function, as shown in the following proposition. such that x u, matching the summing condition of Fact 1.

Propagation of parity sets and division trails
Due to the linearity of the parity set transformation U (on the set of 2 n -bit vectors representing subsets of F n 2 ), propagation of parity sets through a function can be decomposed into F 2 -sums of propagations of single entries through that function.
we say that the input division property u propagates to the output division property v, denoted by

Remark 2.
Since U is an involution, an equivalent formulation is that the set X having An interesting observation is that propagations through a bijective function F are closely related to propagations through the compositional inverse of F .
Proof The proof is simple in the ANF formulation from Proposition 4 by using Fact 2.
The proposition below shows that a vector b ∈ F m 2 belongs to the output parity set if and only if the number of vectors a ∈ F n 2 in the input parity set, such that a propagates to b, is odd. Before the proposition, we state the following lemma required for the proof.
Proof By Proposition 1, we have Swapping the roles of X and U(X) (Proposition 2), we obtain The lemma follows.
Proof We recall that here Y = F (X) corresponds to the symmetric difference of sets where the last equivalence relies on the field having characteristic 2. By Lemma 1, the latter condition is further equivalent to The inner sum is equal to 1 if and only if F v (x) contains the monomial x u (using corollary 2 and Fact 1). Using Proposition 4, we conclude that the equivalent to v ∈ U (Y ) condition is that the number of u ∈ U (X) such that u F − → v holds is odd.
In 2016, Xiang et al. [10] introduced the notion of division trails for conventional division property, and it also generalizes naturally to trails of perfect division property [17,24]. Here, the propagation is not evaluated for a function in a single step. Instead, the definition above is generalized to the setting where F is actually given as the composition of many functions: (1) . [10,17,24]). Let F : F n 0 2 → F n r 2 be given as (1) where

Definition 7 (Division/monomial Trail
We denote such a trail by

Remark 3.
Hu et al. [17] called division trail a monomial trail, since each transition u if and only if the number of trails Proof The proof is by induction on r, with the base case r = 2. Case r = 2: We have F = F (2) • F (1) and then Case r ≥ 3: Now, we prove the induction step. Let r ≥ 2 be such that the theorem is proven up to r = r . Consider the case r = r + 1 ≥ 3. Let F (2) = F (r) • . . . • F (2) so that F = F (2) • F (1) .
By the base case, u F − → v if and only if the number of w ∈ F r 1 2 such that u holds is odd. For each such w, by the induction hypothesis, we have w F (2) −−→ v if and only the number of trails w F (2) − − → . . .

F (r)
− − → v is odd. Therefore, the parity of the number of w such that u F (1) − − → w F (2) −−→ v holds matches the parity of the number of full trails

Propagation rules for basic operations
In the previous sections we have already seen that we can describe the division property of a function step by step by splitting the function in multiple parts. Especially, when it comes to real block ciphers with a block size n = 64 or n = 128 bits, we cannot compute a propagation table of size 2 n × 2 n . In the following we show the propagation rules for five functional complete basic operations: the XOR operation, the AND operation, the NEGATION, the COPY operation, and a bit permutation.  That is, either u 1 or u 2 can be equal to 1, which leads to v 1 = 1 in that case, or u 1 = u 2 = 0, which leads to v 1 = 0.
Case v 1 = 0: Then u ⊕ → v if and only if contains the monomial x u , thus leading to only one possible trail with u = (0, 0, v 2 , . . . , v n ) and it satisfies the relation above.
Case v 1 = 0: → v if and only if contains the monomial x u , which leads to the only possible trail, where u = (0, v 2 , . . . , v n ).
Case v 1 = 0: Then we have u − → v if and only if contains the monomial x u , which exactly holds for u = (0, 0, v 2 , . . . , v n ).
Case v 1 = 1: Here, we have u − → v if and only if contains the monomial x u . This leads to the only possible trail where u = (1, 1, v 2 , . . . , v n ). Proof Let y = P (x), then we have which should be equal to x u , so that u P − → v if and only if P (u) = v. Now, we take a look at an illustrating example of the trail propagation for a simple function.

Propagation through a linear map
Zhang and Rijmen [26] proved a necessary and sufficient characterization of division property transitions through a linear map. While in [26] the statement was given only for invertible maps, Hu, Wang and Wang [27] later proved that it applies to arbitrary linear maps. Although this result was proven in the framework of conventional division property (see Section 4), it can be directly translated to the perfect division property when the weights of the input and the output vectors are equal.

The monomial x u is present in the ANF of M v (x) if and only if
where S is the set of all permutations of [1 ]. Note that the left-hand side of the expression is exactly the expression for the permanent of the matrix M v,u , which, for the binary field, coincides with the determinant. The theorem follows.
It is trivial to show that wt(u) ≤ wt(v) in general, but there is no known simple characterization of transitions u M − → v when wt(u) < wt(v). One possibility is to decompose the map into a sequence of elementary operations and apply the XOR and COPY propagation rules (Proposition 7 and Proposition 9).

Proposition 12. Let M be a linear map given by a linear matrix
Proof The transition u M − → v means that the product M v (x), which is equal to a sum of monomials of degree at most wt(v), contains the monomial x u . It follows that wt(u) ≤ wt(v).

Definitions
Conventional division property, as introduced in the seminal work by Todo [4], can be viewed as a (vectorial) lower bound on (partial) weights of the parity set elements. In other words, it considers monomials in the division trails up to monomial multiples and, in addition, may also group monomials by their degrees on some words, rather than on single bit variables. While these relaxations loose precision (on the contrast to perfect division property), it makes the computational analysis easier and more often feasible.

Remark 4. Clearly, a set X satisfies division property K if and only if X satisfies division
property Min(K), which is minimal.

Remark 5.
The most "tight" division properties that a set X ⊆ F n 2 satisfies are those with ↑ U (X) =↑ K, for example, K := Min(U (X)).
The division property K defined over F n 2 is called bit-based division property. Note that it is included in the general definition due to tw 1,...,1 (x) = x for all x ∈ F n 2 and corresponds to the case r = n (n-dimensional division property). When r = 1, i.e., when K ⊆ [0, n] (1dimensional division property), the division property is called state-based. When 1 < r < n, the division property may be called word-based if it is aligned with words (e.g. S-boxes) used in the analyzed cryptographic primitive.
From Proposition 1 it follows that conventional division property of a set defines an upper bound on the degree vectors of the monomials in the ANF of the set's indicator. In the special case of 1-dimensional division property, this fact was given already in several initial studies [11,13,28]. The following proposition shows the bit-based version of this fact.
Proof The proposition follows from Proposition 1 (stating that U (X) = ¬A(¬(X))), and the fact that Max(A(X)) = Max(A(¬X)) and so ↓ A(X) =↓ A(¬X). For the generalized version, it is left to use that ¬ and ↓ commute with wt.

1-dimensional conventional division property
In this section, we briefly describe main properties and some special cases of 1-dimensional division property, outlined in [11] and partly in [28]. Proposition 13 states that 1dimensional division property simply corresponds to an upper bound on the degree of a set (i.e., on the algebraic degree of its indicator function). Therefore, this case is covered by the theory of Reed-Muller codes.
For the case of 1-dimensional division property K ∈ F n 2 , the following notation is used.
Denote by D n k the set of all vectors from F n 2 with weight at least k: Clearly, if a set X satisfies D n k , then X satisfies D n k for all k with 0 ≤ k ≤ k. Therefore, we are typically interested in the highest such value of k. In fact, this is more clear when it is concluded from Proposition 13 that a set X satisfies D n k if and only if deg X ≤ n − k. We restate this result as in [11] to show the direct connection with Reed-Muller codes. 1. X satisfies division property D n k ; 2. the incidence vector of X belongs to the Reed-Muller code of length 2 n and order n−k; 3. the incidence vector of X belongs to the dual of the Reed-Muller code of length 2 n and order k − 1.
In particular, a tight lower bound on the size of a set satisfying given D n k follows.

Propagation of conventional division property
From now on, we focus on bit-based conventional division property. Most definitions and results can be naturally generalized by applying the corresponding partial weights projection map wt k 1 ,...,k r .
Conventional bit-based division property may precisely capture the parity set up to the presence of an (unknown) constant addition. However, it is also useful in the analysis of key-less functions such as cryptographic permutations due to its high efficiency. Proposition 17 ([11,Prop.6]). Let X be a subset of F n 2 . Then, for any c ∈ F n 2 , U (X+c) ⊆↑ U (X).
Proof By Proposition 13, the set ¬Min(U (X)) is the set of maximal monomials in the ANF of Clearly, this set is invariant under shifting X by a constant, therefore, we have Min(U (X)) = Min(U (X+c)) and also The following proposition shows that this bound is tight and can not be improved without constraining c.

Proposition 18.
For all X ⊆ F n 2 and for all u ∈↑ U (X), there exists c ∈ F n 2 such that u ∈ U (X+c).
Equivalently, for all X ⊆ F n 2 and for all u ∈↓ A(X), there exists c ∈ F n 2 such that u ∈ A(X+c).
Proof The equivalence follows from Proposition 1, stating that u ∈ U (X) if and only if (¬u) ∈ A(¬X). We prove the proposition in the ANF domain.
The proof is by contradiction. Let X ⊆ F n 2 and u ∈↓ A(X) be such that u / ∈ A(X+c) for all c ∈ F n 2 . Let v ∈ F n 2 be minimal such that u ≺ v and v ∈ A(X). In other words, the monomial x u is not present in the ANF of for all c, and the monomial x v is present in the ANF of Consider c = v+u. An arbitrary monomial is replaced in by Thus, the monomial x u appears in only from monomials x t in that are multiples of x u (i.e., u t) having c i = 1 in positions where t i = 1, u i = 0 (i.e., t i +u i = 1). Since the chosen c is such that c ∧ u = 0, it must be u t u + c = v. But x v is the minimal monomial multiple of x u in the ANF of , so that x u is added to exactly once (from the monomial x t = x v ), and thus x u must be present in A(X+c).

Remark 7. This proposition can be also applied in the integral distinguisher scenario (Section 2.5). That is, in the presence of an unknown constant addition in the input (e.g., a whitening key addition in a block cipher), there exist no monomials always missing in the ANF that are divisors of maximal ANF monomials. In other words, if a given monomial is not present in the ANF (for all values of the constant added in the input), then all its multiples are not present either.
We now switch to the most important component of conventional division property theory -propagation through a public function. From Proposition 6 it is clear that propagation of parity sets through a function can be derived from propagation of each single entry of the input parity set, simply by folding the respective image sets with the symmetric difference. Since conventional division property studies lower bounds of parity sets, a (generally tight) lower bound on the parity set propagation can derived as the union of lower bounds of propagated single-entry parity sets. This is based on the following simple observation: for any sets X 1 , . . . , X t , it holds that In order to distinguish conventional division property transitions from the perfect ones (Definition 6), we shall call them "weak" transitions. This also emphasizes that they may lose information about the involved sets.

Definition 10 (Weak transition). Let
Remark 8. The term "weak transition" and the notation u F v are introduced specifically in this survey in order to distinguish the perfect and the conventional division property transition.
Furthermore, the induced ↑ K can not be improved in general.
Proof The proof is by contradiction. Assume that Y does not satisfy K . Then, there must We now prove the tightness by contradiction. Assume that there exists v ∈ K such that, for all sets X ⊆ F n 2 satisfying the input division property K, the output set Y := F (X) satisfies some division property K such that v / ∈↑ K . Since division property is defined up to the upper closure ↑ , we can assume without loss of generality that v is minimal in K .
Therefore, there must exist u ∈ K such that u

Remark 9.
If the input division property is known to be tight (e.g., K = min(U (X))), then the defined output division property K can be improved for some cases of K and F . Indeed, let u, u be minimal in K, u = u such that u F − → v and u F − → v for some v that is minimal in K (as defined in Proposition 19). Assume also that for no other vector w ∈ K it holds w F − → v. Then, by Proposition 6, the two propagations cancel and so v does not belong to U (F (X)) for all sets X tightly satisfying K. Therefore, F (X) also satisfies division property ↑ (K ) \ {v}, which is tighter than K .
This idea of considering some cancellations leads to the so-called "three-subset division property" [7].
The following theorem summarizes several characterizations of the set of all weak transitions through a function. To prove it, we will use the following simple lemma.
and u is a minimal such vector if and only if the ANF of f contains the maximal monomial x ¬u .
Proof Let X be the support of f . By Proposition 1, (2) holds if and only if (¬u) ∈ A(¬X).
Since u is minimal such vector and max(A(¬X)) = max(A(X)), it follows that x ¬u is maximal in the ANF of f .
The following statements are equivalent: and (¬u, v) is minimal such pair. For any fixed v, by Lemma 2, (3) holds with ¬u minimal if and only if S v (x) contains the monomial x u and it is maximal in the ANF of S v . We obtain that extreme elements in the set from point 3 belong to the set from point 4, and extreme elements in the set from point 4 belong to the set from point 3. It follows that the sets of extreme elements coincide (otherwise, we could map an extreme element from one set to the other, find a covering extreme element, map back and show that the initial element could not have been extreme).
An interesting straightforward corollary is the antisymmetry of transitions with respect to the inverse map. It is a direct analogue of Proposition 5.

Minimal and core transitions
From the characterizations given in Theorem 3 it is clear that some valid transitions may imply validity of some other transitions. Therefore, it is useful to study minimal sets of transitions that imply all others. In addition, for practical purposes, it is convenient to consider transitions which are minimal only in the output: this idea is behind the division property propagation table (DPPT), which is used to encode or evaluate the propagation of conventional division property in practice.
Definition 12 (DPPT). Let F : F n 2 → F m 2 . The division property propagation table (DPPT) of F is the mapping DP P T : F n 2 → P(F m 2 ) given by To push the idea of the minimal set of transitions further, Udovenko [16] introduced the notion of core transitions. It stems naturally from the definition of weak transitions (Definition 10).

Remark 11. From Theorem 3 it is clear that core transitions correspond to minimal vectors of U ( F ) or, equivalently, to maximal monomials in the ANF of the graph indicator.
Importantly, the set of core transitions fully identifies all three defined kinds of conventional division property transitions through the function and its inverse, if it exists. Of course, the same characterization can be derived from the set of maximal monomials in the ANF of the graph indicator.
Note that all three sets can be derived from the set D F of core transitions (see Theorem 4):

Linear combinations at the input and at the output
As noticed by Lambin, Derbez and Fouque [29], the result of conventional division property propagation through a function may change significantly, if an invertible linear map is composed with the function before computing the propagation table (and its inverse is composed with the consequent function to preserve the functional equivalence of the analyzed primitive). This setting is also useful for finding integral distinguishers which take an arbitrary affine subspace as an input set, as opposed to bit-aligned cubes. In [29], it was suggested to exhaust all invertible linear maps to be composed with the analyzed function. While this is a viable approach for 4-bit S-boxes, it is hardly scalable for bigger functions. Later, Derbez and Fouque [30] showed that exhausting all possible choices of one linear component (a linear combination) is sufficient to determining whether an integral distinguisher can be found using this method. More precisely, integral distinguishers with an input affine subspace of codimension 1 can be checked.

Upper bounds on the degree of a composition of functions
A classic problem in cryptanalysis of symmetric-key primitives is determining the algebraic degree of a block cipher (more precisely, it's maximum over all possible keys). Low algebraic degree can be used in the higher-order differential attack, introduced by Knudsen [1]. Most, if not all, block ciphers are built from an iterative structure -a composition of simple round functions. Therefore, the problem reduces to finding upper bounds on the algebraic degree of a composition of functions.
In this section, we briefly recall classic upper bounds on the algebraic degree and describe their relations with variants of division property. While bit-based division property is much more fine-grained than a degree upper bound, computing its propagation is generally a hard problem. Degree-based bounds, on the other hand, can be usually derived using pen-and-paper, allowing quick degree estimations.
Chen, Xiang, Zeng and Zhang [15] studied relationships between the bit-/word-/statebased division property (perfect and conventional), the naive and the Boura-Canteaut degree bounds, concluding that division property is superior over those bounds. Udovenko [16] further studied relationships with the recent bounds by Carlet.

Classic methods
The most generic and straightforward upper bound is obtained by simply multiplying the degrees of composed functions. It can not be improved in general without having additional information about the functions.
Later, Boura and Canteaut [12], among other results, showed a new bound based on the degree of the inverse of one of the functions, which is a generalization of the result specific to SPN functions by Boura, Canteaut and De Cannière [32].
This bound is quite surprising as it shows that a block cipher may need nearly twice as many rounds (compared to the naive bound) to reach full degree, as the following example shows.

Example 8.
To illustrate the idea, consider an SPN-based block cipher with 2r rounds, n-bit block and an m-bit S-box, m ≥ 3. The degree of the S-box is at most m − 1 := d.
In the first half of the rounds, let us apply the naive degree bound (Proposition 22), which increases the degree bound by a factor d per round. In the second half of the rounds, let us apply the Boura-Canteaut bound, which decreases the degree "deficit" by a factor d per round. To get full degree n − 1 (degree "deficit" 1) for the full 2r-round permutation, we need d r + d r ≥ n. Let r be the smallest number such that the condition holds, i.e., 2(r − 1) rounds are not enough to reach full degree. However, using only the naive bound, the upper bound may reach full degree already after r + 1 rounds, instead of 2r: d r+1 ≥ 2d r ≥ n.
More detailed analysis of this phenomenon in the framework of Feistel Networks was done in [33], and in the framework of SPNs in [13].
More recently, Carlet [14] showed several bounds based on the degrees of the involved graph indicators. We reproduce here one general theorem which is particularly relevant for division property. Bounds from this theorem are not comparable to the previous bounds (in the common case of the composition of two functions). That is, for any of the classic bounds, there exists a case where it will be strictly stronger than the other bounds. A more complete study and comparison of classic bounds (excluding the division property) can be found in [14].

Formulation with conventional division property
In the seminal paper [4], Todo provides a propagation rule of conventional division property through a vectorial Boolean function. Proposition 23 ([4]). Let X ⊆ F n 2 satisfy division property D n k , and let F : Since 1-dimensional division property corresponds to an upper bound on the algebraic degree of the set (i.e., we have deg X ≤ n − k, deg F (X) ≤ m − k ), we can conclude the following upper bound on the degree of the output set.
This bound clearly resembles the Boura-Canteaut bound (Theorem 8). However, the proposition relates the degrees of sets, rather than the degrees of functions. Furthermore, it has deg F instead of deg F −1 in the denominator and even does not require F to be invertible. In fact, Corollary 4 can be interpreted exactly as the Boura-Canteaut when F is invertible: let so that The latter identity also implies the complementary analogue of the basic upper bound (Proposition 22): deg F (X) ≤ deg X · deg F −1 , which can be translated into the division property terminology.

Remark 15.
Todo also noticed that division property D n n is preserved by bijections (since the input and output sets have to be exactly F n 2 ). This corollary includes this case.
The two rules from the corollaries mirror the two classic degree-based upper boundsthe naive bound and the Boura-Canteaut bound. However, division property can be more precise: instead of computing k solely from k and the degree of the function (or of its inverse), it can be computed from the DPPT of the function. More precisely, the value is optimal (maximum possible) for k , and the degree-based bounds do not always reach it (see Example 9 below). This parameter, as a function of F and k, was introduced in [28].
By Theorem 3, the value of D F (k) can be also characterized by In other words, the 1-dimensional division property propagation table is fully characterized by the pairs of degrees (one per each variable) of maximal monomials of the graph indicator of the considered function F . This information is more detailed then simply the degree of the graph indicator, used in the Carlet's bound. Udovenko [16] exhibited a close connection of the Carlet's graph indicator method with conventional division property propagation. In fact, the latter can be seen as a more finegrained variant of the former, requiring more information about the composed function, more computational effort to compute the upper bound, but (possibly) resulting in a stronger bound.

Conventional division trails and indicator monomial trails
The result relies on the following representation of the graph indicator of a composition as the sum of the products of the involved indicators over all possible values of all intermediate variables.
Proposition 24 ([14, 34]). Let G i : F The following theorem shows that conventional division property trails essentially correspond to chains of monomials from the involved graph indicators.
Proof The proof relies on Theorem 3, applied to each link in the indicator chain and division trail: The formal expansion I contains a monomial multiple of (4)

Exposition of compositional bounds through bounds on monomials of the graph indicator
In this section, we will show that Carlet's graph indicator method allows illustrative comparison of classic compositional bounds and 1-dimensional conventional division property.
A key idea is to derive bounds on degrees of monomials in the graph indicator directly from a given classic bound. The motivation comes from the fact that possible degrees of monomials in the graph indicator's ANF define 1-dimensional division property propagation. This allows comparison of bounds in the same setting. The obtained bounds can then be illustrated graphically on an example function, exhibiting visually the impossible degree pairs that are or are not removed by each of the bounds. We will focus on the classic bounds for compositions of two functions.
Let F : F n 2 → F m 2 . Let x u y v be a maximal monomial in Define g : F m 2 → F 2 : y → y ¬v . Note that deg g = m − wt(v). Recall that Observe that the monomial x u y v y ¬v = x u y 1 belongs to the polynomial as it occurs exactly once in the formal expansion (due to the maximality of x u y v ). It follows that x u belongs to g(F (x)) or is constant (if u = 0). Hence, wt(u) ≤ deg g • F . We can now apply classic bounds to the composition g • F and obtain bounds on degrees of the monomial x u y v from the graph indicator of F .
2. By the Boura-Canteaut bound (Theorem 8), we have 3. By the Carlet bound (Theorem 9), we have Therefore,  2 ), so that Shaded areas highlight the points not satisfying the respective bounds (exclusive of the respective lines themselves) Note that by the monotonicity of bounds and by the fact that x u y v was chosen as a maximal monomial in the graph indicator's ANF, these bounds ((8), (9), (10)) hold for all monomials in general.

Example 9.
We will now illustrate these bounds on a simple example. Let F : F 14 2 → F 14 2 be defined as The three bounds together with the actual maximal degree pairs are displayed graphically on Fig. 3.  [36]). Let E : F n 2 × F 2 → F n 2 be a block cipher. We say that E fulfills the integral-resistance property when for every proper non-empty subset M ⊂ n 2 and every non-zero output mask β ∈ F n 2 the sum x∈M β, E k (x) is key-dependent.
Definition 16 (Integral-Resistance Matrix [36]). Let E : F n 2 × F 2 → F n 2 be a block cipher with the corresponding ANF Further, let v 1 , . . . v s ∈ F 2 be key patterns. We call the following matrix over F n 2 ×s an integral-resistance matrix. A column i corresponds to a key pattern v i and each row corresponds to a combination of an output bit and a monomial of weight n − 1.
Now, the integral-resistance matrix is sufficient to prove the integral-resistance property for a block cipher, assuming an independent whitening key added at the input (see also Proposition 18).

Proposition 26. [36]
Let E : F n 2 ×F 2 → F n 2 be a block cipher and I(E) be a corresponding integral-resistance matrix. If I(E) has rank n 2 and k 0 is an independent whitening key, E k (x + k 0 ) fulfills the integral-resistance property.

Conclusions
In this survey, we explained the state-of-the-art variants of the division property, explained the connection with the ANF and how the division property can be used to find both attacks and security arguments for symmetric cryptographic primitives. Our focus was on the underlying theory and on a clear and precise notation. We hope that in particular readers with a background in Boolean functions find our survey helpful.
We also like to note that there are several important topics that we did not cover. Those topics are mainly concerned with an efficient computation of the division property using either dedicated algorithms [4,7,30,37] or general tools, in particular mixed integer linear programming tools [8,10,17,[38][39][40][41][42][43], SAT/SMT solvers [16,27,[44][45][46][47][48]. It is actually this practical computational aspect that is of great importance for all variants of the division property. As described, the security of symmetric primitives is the concrete security against concrete attacks. The division property is a powerful set of techniques to compute concrete properties of those ciphers.