Improved parallel mask refreshing algorithms: generic solutions with parametrized non-interference and automated optimizations

Refreshing algorithms are a critical ingredient for secure masking. They are instrumental in enabling sound composability properties for complex circuits, and their randomness requirements dominate the performance overheads in (very) high-order masking. In this paper, we improve a proposal of mask refreshing algorithms from EUROCRYPT 2017 that has excellent implementation properties in software and hardware, in two main directions. First, we provide a generic proof that this algorithm is secure at arbitrary orders—a problem that was left open so far. We introduce parametrized non-interference as a new technical ingredient for this purpose that may be of independent interest. Second, we use automated tools to further explore the design space of such algorithms and provide the best known parallel mask refreshing gadgets for concretely relevant security orders. Incidentally, we also prove the security of a recent proposal of mask refreshing with improved resistance against horizontal attacks from CHES 2017.


Introduction
State-of-the-art.Side-channel attacks are an important threat to the security of cryptographic hardware [22].While originally mostly applied to small embedded devices, they recently targeted an increasingly large spectrum of implementations, such as general-purpose computers or smartphones [26,19,1,16].So the understanding of these attacks and solutions to prevent them is an important challenge for modern security applications.In this context, masking is now established as a widely used countermeasure allowing cryptographic designers to mitigate side-channel attacks.Its main idea consists in splitting any sensitive value into several shares so that the adversary needs to collect and combine information about all the shares to extract secrets from the leakages.One key reason for the success of masking is the good theoretical understanding it allows.In particular, the security of masking is now shown in various security models, ranging from the abstract probing model of Ishai et al. [20], the noisy leakage model of Prouff and Rivain [23], and the bounded moment model of Barthe et al. [5].Additionally, fundamental steps have been made in the connection of these models.In particular, the work of Duc et al. showed that security in the most abstract (but easiest to manipulate) probing model implies security in the most practically relevant (but more involved) noisy leakage model [12].One important consequence of this work is that it is now possible to verify the security of masked implementations based on their abstract description, and to translate these abstract guarantees into concrete security [13], by checking two additional conditions: the independence of the shares' leakages (which is typically assessed in the bounded moment model) and a sufficient noise level (which is naturally assessed in the noisy leakage model).The latter results suggest that obtaining masking schemes that are secure in the (abstract) probing model is anyway a good preliminary (if not a strict requirement) for the design of secure masked implementations.
Yet, and despite its appealing conceptual simplicity, reasoning about security in the probing model is challenging, because the number of possible observation sets in a masked implementation incurs an exponential growth when the number of shares increases.This difficulty is typically illustrated by subtle bugs in early works and hand-made proofs on masking [25,24], corrected in [10,11].As a result, the state-of-the-art proofs of masked implementations now generally combine two types of approaches.
The first one is to rely exclusively on composable gadgets, as formalized by Barthe et al. [4].The latter made a significant step towards sound compositional reasoning, by introducing Strong Non-Interference (SNI).Informally, SNI refines previous notions of probing security, by separating between external and internal observations and by requiring that the number of shares needed to simulate an observation set is upper bounded by the number of internal observations.It provably allows connecting masked gadgets without risks of compositional flaws (at the cost of performance overheads and randomness requirements), and therefore simplifies the analysis of full implementations.
The second one is to carry out more global compositional proofs.It generally allows improving the performances and reducing the randomness of masked implementations, this time at the cost of a more challenging analysis.In this case, the implementations typically mix SNI gadgets with (less demanding) NI gadgets (i.e., a relaxation of SNI where the the number of shares needed to simulate an observation set is upper bounded by the total number of observations).Evaluations can then take advantage of formal methods to better deal with the large number of cases to be covered by the proofs [3].
One essential ingredient for both approaches are so-called refreshing algorithms which are instrumental in compositional reasoning [7], and generally allow "splitting" a masked implementation in small (enough) parts that can then be analyzed globally.Recent implementations of (very) high-order masking schemes have shown that the (randomness) cost of these refreshing algorithms now accounts for a significant part of the global performance overheads [17,21].In this paper, we therefore tackle the generic improvement and automated optimization of such refreshing algorithms.Contribution.Our starting point is a parallel mask refreshing algorithm which was conjectured to be SNI in [5] and has good implementation features for software (and hardware [18]) implementations.In particular, it can easily be implemented with simple operations such as rotations and XORs that are available on most devices.Our contributions in this respect are threefold.
First, we observe that existing notions of NI and SNI may lack in granularity to analyze the security of this refreshing.We therefore introduce a novel notion of Parametrized Non-Interference (f -NI).Informally, f -NI maintains the distinction from SNI between external and internal observations, but requires that the number of shares needed to simulate an observation set is bounded by a function f of the number of internal observations.The definition of f -NI subsumes SNI (by setting f to be the identity function on naturals smaller than the masking order) and NI (by setting f to be the function that maps all naturals smaller than the masking order to the masking order).
Second, we leverage this new notion of f -NI in order to answer in the positive the conjecture from [5], and show that their (iteration of) parallel mask refreshing gadget(s) is indeed SNI.
Third, we amplify our results on efficient mask refreshing gadgets using synthesis-based techniques to explore the design space of parallel mask refreshing gadgets.The synthesis approach exploits recent works on automatically proving security of masked implementations in the probing model [3,4].It uses a combination of ideas from program synthesis: intuition-guided templates to selectively reduce the space of gadgets to explore, a scoring system to prioritize the search, a partial order reduction to analyze only once gadgets that differ in inessential details, and efficient data structures to improve the verification of individual gadgets.The overall approach delivers gadgets that outperform the state-of-the-art for (concretely useful) security orders between 5 and 10 (and for larger orders if plugged in the recursive construction of Battistello et al. [6]).

Parametrized Non-Interference
Previous work introduced the notion of t-SNI, which supports reasoning compositionally about probing security.It refines t-NI security with a measured independence between observations on inputs and outputs which appears to be just enough to safely compose.Nevertheless, intermediate security notions could be used to provide more flexible and precise compositional reasoning with better performance.For this purpose, we introduce the notion of parameterised non-interference (or (t, f t )-NI), a generalization of both t-SNI and t-NI.

Definition and Discussion
In the following, we adopt the "non-interference" style of definitions for probing security introduced by Barthe et al. [4], but also further clarify their relationship with more standard definitions of probing security, such as those of Faust et al. [14], which are also composable, or those of Prouff and Rivain [23] and Ishai, Sahai and Wagner [20], which are not.
We first recall some basic definitions on gadgets.
Given a set K equipped with at least a group structure, we call m-encodings in K vectors of m elements in K.In practice, such encodings are related to a particular encoding scheme, which we often assume to be additive (that is, an encoding a is an encoding of some value a ∈ K iff a = m−1 i=0 a i , where is the iterated addition in K and a i is the ith element in vector a).However, many of the techniques and arguments can be generalised to other encoding schemes.We often work in the scenario where m = t + 1 and prefer t-based notation throughout, except where we deviate from this scenario.
A gadget G is a probabilistic algorithm that takes as input n m-encodings (where n is the arity of the gadget) and returns a single m-encoding, its output.(All our definitions and results can be generalised, at some formal cost, to multioutput gadgets.)We define the semantics of an n-ary gadget G as the function G that, on input n m-encodings a 1 , . . ., a n , returns the joint distribution of all intermediate variables (including those used to form the output encoding).In the following, we assume that all intermediate variables are given a distinct index in the gadget (for example, the line at which they are defined), and use that index to refer to wires.We call output positions those indices that correspond to variables that are used as outputs, and internal positions those indices that correspond to wires that do not serve as outputs.
Given an indexed set of values or joint distribution Ω and a set of indices I, we finally denote with Ω |I the projection or marginal distribution of those elements of Ω selected by I.For example, the set {a, b, c} indexed by (0-indexed) position, could be restricted as follows {a, b, c} |{1,2} = {b, c}.
Based on these definitions, we formalize f -NI as follows: 1.I ≤ f t (| O|, |O|), and 2. for any two sets of inputs (a i ) 0≤i<n , (b i ) 0≤i<n to G, we have Informally, a gadget G is (t, f t )-NI if, for any observation set d, there exists a subset of input shares S (whose size is appropriately bounded) that is such that any two inputs that agree on shares in S will produce the same joint distribution on variables in d through G.When clear from context, we will often omit the t parameter, simply writing (t, f )-NI, or even f -NI.We note in particular that this property must be established without prior knowledge or assumptions on the distribution of input shares.This makes it distinct (and indeed strictly stronger, as demonstrated in the long version of Barthe et al. [2]) from more local notions such as the tth-order security by Rivain and Prouff [24].The latter simply requires that any t-tuple of intermediate variables in an implementation is independent of any sensitive variable.For simplicity, we next denote this basic security requirement t-probing security.
We note also that Ishai, Sahai and Wagner, in their proofs, make use of an intermediate notion, that Carlet et al. [8] also call perfect t-probing security, and which they define based on one's ability to simulate observations based on a subset of an algorithm's input shares.More precisely, they prove on gadgets and algorithms that "any set of d < t probes can be simulated with at most t shares of each input".The meaning of simulation, if taken in the traditional cryptographic sense of an interactive algorithm that should be indistinguishable from the real one, requires specifying which information is made available to the distinguisher.Compositional reasoning requires that the distinguisher can access both the (simulated) probes and input shares.We use a simpler interpretation of the word to denote a (mathematical) function that takes at most t inputs and calculates a joint distribution that is perfectly equal to that produced on its probes by the gadget or algorithm under study, for any set of input encodings that coincide on those shares given to the simulator.
Simulation and Non-Interference.When interpreted in this sense, perfect tprobing security is in fact equivalent to (t, f t 0 )-NI, with f t 0 (t 1 , t 2 ) = max(t 1 + t 2 , t).Going further, and using the non-interference-based notions, Belaïd et al. [7] prove that (t, f t 0 )-NI (and thus perfect t-probing security) is in fact equivalent to Barthe et al.'s baseline non-interference notion, t-NI [4].We note here that f -NI is a strict generalisation of t-NI and t-SNI.

First Compositional Results
We now illustrate how one can reason about the security of compositions of f -NI gadgets by stating a simple composition result.
Proposition 1 is central to some of the compositional proofs given in the rest of this paper.However, it does not clearly show that the composition of two f -NI gadgets is also f -NI for some well chosen f .The following Corollary clarifies this, and justifies our claim that (t, f )-NI is composable.It is a direct consequence of Proposition 1.
We note that the mere existence of this compositional result does not make f -NI an immediate replacement for NI or SNI: indeed, the composition result shown here exhibits a combinatoric growth in the number of cases to consider when performing a compositional analysis on a circuit (growing with the number of composed gadgets).When composing gadgets that are only NI or SNI, analysis is more efficient but slightly coarser.

A Closure Property
In addition to composition, (t, f )-NI also enjoys an interesting closure property, similar to that enjoyed by Barthe et al.'s more general notion of (I, O)-NI [4].
Proof.The result is a direct consequence of the definition for (t, f )-NI.
Although this property is not used in this paper, we present it here as a general property of (t, f )-NI that may be useful in other applications of the notion.
In this section, we formally analyze, for all t the iteration of regular (parallel) refreshing gadgets proposed and verified at low orders by Barthe et al. [5].In particular, we prove that the refreshing gadget that successively adds t/3 independent encodings of 0 to its encoded input is in fact t-SNI for all t.
Our security proof leverages the notion of f -NI and its compositional properties.Specifically, we first prove that the core additive mask refreshing is f -NI for some appropriate f , and then apply the composition result to conclude that the gadget itself is t-SNI.Beyond its intrinsic interest, this example illustrates the value of the f -NI notion as a generalization of both NI and SNI, but also as a fine-grained property that can be used to compositionally analyse the probing security of gadgets.

The RefreshBlock Gadget.
Our starting point is the RefreshBlock algorithm shown in Algorithm 1, where indexing is modulo t + 1, and which corresponds to the building block proposed by Barthe et al. [5] (i.e., Algorithm 1 in this reference).Our description of this algorithm uses slightly different notations, and reorders some of the computation, but we note that our algorithm and Barthe et al.'s [5] compute exactly the same intermediate values and output expressions, and therefore have equivalent probing security (including f -NI and SNI) and functionality.For clarity, we also describe this functionality by giving the expressions for each of the output shares c ı in terms of the input shares a ı and fresh random masks r ı : Alg. 1 Refresh block algorithm [5].
function RefreshBlock(a) In addition to its obvious t-NI qualities, we prove that this gadget enjoys a slightly stronger security property, which intuitively constrains an adversary to always place at least two internal probes in order to preserve any information it may have obtained about the gadget's output shares.However, and importantly, even when this condition on the adversary's choice of probes is fulfilled, the adversary will always lose information corresponding to one probe through this gadget.
Proposition 3 Gadget RefreshBlock is f RB -NI where: Proof.Gadget RefreshBlock has four kinds of intermediate variables: The first three categories gather internal intermediates variables while the last one refer to output variables. 8f t 2 = 0 then the observations are only internal variables that contain each at most one input share.Therefore, we can perfectly simulate them using the corresponding input share.We thus need at most t 1 input shares to simulate t 1 internal observations.Assume now that t 1 ≤ 1.We denote by non-input observation any observation that is not an input share.We start by assigning, to any possible non-input observation m, a set of indices m: We extend this notion canonically to sets of observations as follows: O m∈O m for any set of non-input observations O.We now prove that if O is a set of non-input observations that contains at most an internal observation, then O can be simulated using no input shares and using randoms in {r i | i ∈ O}.If O = ∅, we are done.Otherwise, assume first that there exists m ∈ O and an index k ∈ m s.t.k / ∈ O m , where O m = O \ {m}.By induction hypothesis, O m can be simulated using no input shares and using randoms in R {r i | i ∈ O m }.By a direct case analysis on m, it is clear that m can be simulated using the random r k / ∈ R. Hence, we obtain that O can be simulated using no input shares and using randoms in {r i | i ∈ O}.We now prove that there always exists such an observation m.If O is a singleton set, then we take the unique element of O for m.Otherwise, O contains at least two observations and at least one of them must be an external observation.This, coupled with |O| ≤ t, implies the existence of an external observation Since O is of maximum size t, the length i + 1 of this chain is at most t.Hence, , and we can use c k−i for m.We now conclude the case t 1 ≤ 1.If t 1 = 0 or t 1 = 1 with the internal observation not being an observation of an input share, we conclude by directly applying the fact we just proved.If t 1 and we observe an input share a k , we first simulate the input share using a k and no randoms, and then, applying again the previously proven fact, simulate the outputs using only randoms.Now, assume that 2 ≤ t 1 and 1 ≤ t 2 .To any observation m, we attach an optional input share dependency m s.t.m can be perfectly simulated from m by simply using the arithmetical expression of m: Again, we extend this notion canonically to sets of observations as follows: It is clear that if m 1 and m 2 are two observations that we simulate in isolation using their respective arithmetical expressions and the input-shares m 1 and m 2 , then {m 1 , m 2 }, as a set of observations, can be simulated using their respective arithmetical expressions and the input-shares m 1 and m 2 .Hence, if our set of observations is s.t.| O| < t 1 + t 2 , we can simulate O using at most t 1 + t 2 − 1 input shares.Otherwise, since |O| ≤ t 1 + t 2 , it must be that we have exactly t 1 + t 2 observations, that we do not observe any random, and that for any two distinct observations m 1 k1 and m 2 k2 of O, we have k 1 = k 2 -we say that O is injective.Assume that we know an observation m that does not depend on any observed input shares and s.t.there exists k ∈ m \ O m .Then, we can simulate O by first simulating m using the random r k and then, like in the previous case, by simulating all the observations O m using their respective arithmetical expressions -this last simulation step only depends on randoms different from r k and, as we have just seen, uses at most t 1 + t 2 − 1 input shares.We conclude the proof by showing that there always exists such an observation m.Let m k be any non-input share s.t.no non-input observation of form the n k+1 belongs to O -such an observation always exists by cardinality contraint on O.We have that k ∈ m k and m k may at most depend on the input share a k .For k to occur in O m k , it must be that a non-input share of the form n k occurs in O m k or that c k+1 ∈ O.However, both condition are unsatisfied: the first one by injectivity of O, the second one by definition of m k .Last, again by injectivity of O, we have that a k / ∈ O, hence m k do not depend on any input share of O.
Remark 3 The bound t 1 +t 2 −1 is reached with the following kind of observations, and is therefore tight: {b t , c 0 , . . ., c k , r k }.

t-SNI Mask Refreshing by Iterating RefreshBlock.
We now aim to show that strong refresh gadgets can be obtained by iterating RefreshBlock gadgets.We prove that for all t, repeating RefreshBlock t/3 times constructs a t-SNI mask refreshing gadget.This coincides with the intuition obtained from the low-order observations made in previous work [5].We note that Coron [9] had also proposed to iterate a similar additive mask refreshing gadget t times in order to support secure composition.This result is a strict refinement of his.
Proof (Proposition 4).We first note that The proof is by induction on x.For x = 1, we note that f x RB (t 1 , t 2 ) = f RB (t 1 , t 2 ) for all t 1 , t 2 and conclude by Prop. 3. Assume now that the property holds for x.We prove that it also holds for x + 1.We consider the composition: We consider a set Ω of observations in RefreshBlock x+1 such that |Ω| ≤ t, and partition it into sets of t x observations internal to RefreshBlock x , t 1 observations in RefreshBlock, and t 2 output observations.We therefore need to show that any such Ω can be simulated using at most f x+1 RB (t x + t 1 , t 2 ) shares of its input.By Propositions 1 and 3, it is sufficient to prove ).Let us first consider the case t 2 = 0.In this case, we have: RB (t x + t 1 , t 2 ) = t x + t 1 and we can conclude; -Else 2x ≤ t x .Therefore, we have f x RB (t x , t 1 ) = t x + t 1 − x and f x+1 RB (t x + t 1 , t 2 ) is either t x + t 1 or t x + t 1 + t 2 − (x + 1) (depending on the value of t x + t 1 ).The conclusion is trivial in the first case, and follows from t 2 ≥ 1 in the second case.We now have t 2 ≥ 1 and Table 1 summarizes the concrete results Corollary 2 yields for some low orders.To the best of our knowledge, these are currently the best known parallel SNI mask refreshing algorithms.We further improve them next.The previous proof followed the identification by Barthe et al. [5] of a pattern leading to low-order secure mask refreshing gadgets.A more systematic exploration of the design space may therefore further reduce the randomness complexity of such gadgets.In this section, we highlight two particular observations that may be of independent interest in other search efforts, and yield particularly interesting performance gains.

Avoiding Repeated Pairs
A first interesting pattern we identified is that the most efficient mask refreshing gadgets at low orders rarely involve repeated pairs of random variables (i.e., they rarely include both the pattern a i ⊕ r j ⊕ r k and a j ⊕ r j ⊕ r k for i = j).However, looking at the definition of our RefreshBlock t/3 mask refreshing algorithm in Section 3.2, we observe that the same pairs of shares are involved in the same way in all iterations of RefreshBlock.
We thus consider a slight variant RefreshBlock j of RefreshBlock, shown as Algorithm 2, that rotates its vector of randomness by j instead of 1 only.By composing RefreshBlock with RefreshBlock j for some well-chosen j, we therefore avoid the repetition of patterns in the use of shares and randomness during successive iterations of the algorithm.
function RefreshBlockj(a) In this exploration experiment, we look for sequences of rotation offsets that allow us to reduce the randomness and time complexity of the mask refreshing gadget whilst still achieving t-SNI security.
Table 2 summarizes the results of our exploration for 8, 9, 10, 11 and 12 shares.For all, we find solutions that require only two iterations of RefreshBlock j (with different values of j).This improves on the general result obtained by applying Corollary 2, but preserves the parallel nature of the mask refreshing algorithm.
Our exploration in this setting further fails to find any SNI algorithms with only 2 iterations of RefreshBlock j for 12 shares, although it notably finds an algorithm for 12 shares with 3 iterations of RefreshBlock j , i.e., RefreshBlock 1 ; RefreshBlock 2 ; RefreshBlock 3 with a cost of 36 random field elements.
We note some interesting symmetries in the results due to the fact that a rotation of i positions in one direction is equivalent to a rotation of t + 1 − i positions in the other.Interestingly, it also appears that, at order t, the gadget RefreshBlock 1 ; RefreshBlock (t+1)/2 is never t-SNI when t + 1 is even.

Breaking Chains
Out of the results of the first wave of parallel synthesis, we make a second observation: attacks found by the verification tool in the failure cases rely on the construction of "chains" of observations to break SNI.More specifically, an adversary that makes a particular observation, in order to propagate it back to input shares, needs to carefully select intermediate observations that "lock" the random variables used as masks on that share, and further expend an internal observation to "cap off" the chain, blocking the final random variable in it.For example, considering RefreshBlock 1 ; RefreshBlock 2 for t = 10, the observations marked below (in brackets [•]) are a counter-example to SNI: they include 4 output observation and 6 internal observations whose distribution depends on 7 input shares.A key observation is that those intermediate observations that serve to construct the chain (a 8 ⊕ r 1 8 ⊕ r 1 7 ⊕ r 2 8 , a 7 ⊕ r 1 7 ⊕ r 1 6 ⊕ r 2 7 and a 6 ⊕ r 1 6 ) do not simply lock random variables to make them unusable by the simulator: they also let the adversary gain information about one additional input share, leading to the attack.In short: those intermediate observations give more information to the adversary by letting her observe an input-dependent distribution and remove power from the simulator by locking random variables away.
With this observation in mind, it seems natural to consider a slight twist on the construction: instead of immediately adding the sampled randomness onto the input, we could construct an encoding of 0 using the same rotationbased technique and only once enough randomness has been mixed in add it onto the input encoding a.The consequence of this modification (which we generalize below) on the attack above is that intermediate observations can remove power from the simulator by locking random variables, but will never give any information about input shares to the adversary.
We now define general algorithms embodying this approach to mask refreshing, and state some results.Consider Algorithm 3a, which corresponds to the inputfree part of RefreshBlock j that computes an encoding of 0. We denote with RefreshZero t (a) (with a list of rotation offsets for use in ZeroBlock) the algorithm that produces an initial encoding of 0 using ZeroBlock (with the appropriate rotation offset) and remasks it using RefreshBlock (with the appropriate rotation offset) before adding the final value onto the input shares. 9lgorithm SNI Verif.OoM RefreshZero

month
Table 3: Order of Magnitude (OoM) of time taken for the verification of some RefreshZero-based mask refreshing gadgets.
whenever, for any set of probes Ω = O ∪ O with O a set of variables internal to G, and O a set of output variables of G such that |Ω| ≤ t, there exists a set of input indices I such that: we take c k for m.Otherwise, for k to occur in O c k , since c k+1 / ∈ O, it must be that O contains one (and exactly one by assumption on O) internal observation of the form b k or r k .Consider the maximal chain c k−i , . . ., c k of external observations of O.

Table 1 :
Number of required instances of RefreshBlock and random values for several small orders

Table 2 :
Some parallel mask refreshing gadgets.