Non-proﬁled Mask Recovery: the impact of Independent Component Analysis

. As one of the most prevalent SCA countermeasures, masking schemes are designed to defeat a broad range of side channel attacks. An attack vector that is suitable for low-order masking schemes is to try and directly determine the mask(s) (for each trace) by utilising the fact that often an attacker has access to several leakage points of the respectively used mask(s). Good examples for implementations of low-order masking schemes are the based on table re-computations and also the masking scheme in DPAContest V4.2. We propose a novel approach based on Independent Component Analysis (ICA) to eﬃciently utilise the information from several leakage points to reconstruct the respective masks (for each trace) and show it is a competitive attack vector in practice.


Introduction
Over the past decade, Side Channel Attacks (SCAs) have become a major threat for various cryptographic devices.Depending on the specific attacker model, most SCAs can be divided into two categories: profiled attacks and non-profiled attacks.In a profiled attack, the attacker (a priori) creates direct approximations of the device's leakage function, and uses these in an attack.This typically results in very efficient attacks but with the strong assumptions about the capabilities of the attacker.Non-profiled attacks only require a proportional (or weaker) approximation of the device's leakage model.The canonical example of such an attack is to approximate the device leakage with the Hamming weight of intermediate values, and utilise correlation as a distinguisher.Attacks in both categories often proceed via a divide and conquer strategy, which require (in the divide step) to explicitly guess partial keys.Consequently (in a known plaintext setting) such attacks are limited to first and last rounds of typical block cipher constructions.
In 2017, Gao et al. proposed a new non-profiled SCA based on Independent Component Analysis (ICA) [1].Assuming the observed leakages follow the weighted Hamming weight model, the ICA based attack recovers the intermediate states without making any explicit key guesses.In their paper, the authors demonstrate several applications of this approach, including a new keydistinguisher, attacking the middle encryption rounds as well as reverse engineering.However, all previous discussions about ICA-based SCA focus on unprotected implementations.We are hence interested in investigating if ICA-based SCA can be useful to attack protected implementations.
For ICA-based SCA to work it is imperative to have access to several leakage points for some (targeted) intermediate value.In masked implementations, one can often observe leakages related to the manipulation of masks in the processor.Hence, ICA-based SCA could be a powerful tool for mask recovery in masked implementations, in particular optimised low-order masking schemes.
Our Contribution.In this paper, we explore the potential of ICA to compromise implementations of some (low order) masking schemes.Specifically, in table re-computation schemes, the multiple XORs in the re-computation process naturally provide multiple leakage observations for ICA.Compared with previous attacks, our ICA-based mask recovery finds the n-bit random masks with only n leakage points, whereas previous attacks take 2 n points.Experiments confirm that for smaller Sboxes (n = 4), ICA-based attack outperforms horizontal attakcs on smart card implementations.For the Rotating Sbox Masking (RSM) scheme, which is used in the DPAContest V4, our analysis proves that if the attacker chooses the leakages wisely, the random masks can be recovered as an approximate ICA problem.Although the mask recovery becomes less accurate, the following key recovery is hardly affected.
Paper Organization.In Section 2, we briefly review the targeted masking schemes as well as our primary tool-ICA.Section 3 analyzes the leakage behaviour of table re-computation schemes in details.As the XORs naturally provide multiple leakage observations, ICA enables the attacker to determine both the random masks and the secret key.We present another masking scheme-the masking scheme in DPAContest V4.2-in Section 4.Although this scheme computes the masked tables offline, the relevant random indexes in each round provide considerable leakages for ICA-based SCA.Impacts of this approach and conclusions are further presented in Section 5.

Masking schemes
To date, masking is one of the most prevalent countermeasures for software implementations.In general, a masking scheme conceals the cryptographic intermediate states with random values.As a result, the data-dependent leakage no longer relates to the secret key.Previous studies proposed a variety of masking schemes, such as affine masking [2], polynomial masking [3] and inner product masking [4].In this paper, we focus on Boolean masking, the most frequently implemented approach.In a Boolean masking scheme with d-shares, an intermediate state x is split into d shares x (1) , x (2) , ..., x (d) where x (1) As each leakage point only depends on one x (i) , the attacker cannot learn any useful information, unless they combine the leakages of all d shares.
Alternatively, the whole Sbox can be computed as a bunch of masked ANDs and masked NOTs.Compared with the unprotected implementations, this construction significantly increases the computation cost.-Table Re-computation.In many look-up table schemes, the masked Sbox is computed as a look-up table [7,8,9].In the first step, these schemes often generate a masked table using all the shares from x (1) to x (d−1) .Then, the output shares y (1) , y (2) , ..., y (d) can be found by simply looking up x (d)  in the masked table.The major drawback of this approach, is that the recomputation stage is not only costly, but also exploitable.For an n-bit Sbox, this procedure provides 2 n leakage points for each data share x (i) .Thus, the attacker can collect all leakage points on the trace ("horizontally") and use a standard DPA style attack to recover x (i) .For n = 8, this horizontal attack is actually quite efficient for software implementations [10,11].-Global look-up tables.Alternatively, the masked table can also be computed offline [8].In this case, a masked table is generated for each possible mask and stored in the data RAM/ROM.Considering the enormous memory cost, this approach is more suitable for smaller Sboxes (eg.4-bit Sbox) 1 .For larger Sboxes, it often applies in Low-Entropy Masking Schemes (LEMS), such as the Rotating Sbox Masking (RSM) [13].Instead of random masks, LEMS usually uses a precomputed set of constant masks, which significantly reduces the memory cost [13].As a lightweight SCA countermeasure, it is LEMS's design philosophy to resist not all but a selection of important and powerful attacks [13].Results from DPA Contest v4 and v4.2 are consistent with such statement: in the profiling case, the secret key can be found with only one trace [14].

Independent Component Analysis
Independent Component Analysis (ICA) [15] belongs to a class of problems called Blind Source Separation (BSS), which requires to separate a set of mixed signals, without the aid of information about the source signals or the mixing process.A common example is the cocktail party problem in which the challenge of a partygoer is to pick out a single conversation when in a noisy room.
Suppose we have n simultaneous conversations (sources) S = {s 1 , s 2 , ..., s n } going on in the party room.Microphones are placed in different positions, recording m mixtures (observations) of the original sources Y = {y 1 , y 2 , ..., y m }.Assuming the observation y j is a linear mixture of all sources, we have y j = a j,0 + a j,1 s 1 + a j,2 s 2 + ... + a j,n s n where a j,i stands for the real-valued coefficient.The overall mixing procedure can be written as where A is called the mixing matrix.In signal processing, such statistical model is called Independent Component Analysis [15].With additional multivariate Gaussian noise N, the noisy ICA model is defined as The goal of ICA, is to recover the unknown sources S from the observation Y, without knowing the mixing matrix A or the Gaussian noise N in advance.

ICA in Side Channel Analysis
Assuming the target device's leakage function is linear (in the bits of the intermediate values), recovering the secret intermediate values in SCA is quite similar to an ICA problem [1].Specifically, when operating an n-bit intermediate state x, the data-dependent leakage can be written as Here x i represents the i-th bit2 of xand L is a linear leakage function.This leakage function has the same form as one ICA observation (i.e.y j in Sect.2.2).However for ICA we need more than a single observation.Suppose that the device not only computes x but also computes some other intermediate state x = x ⊕ c (c is a constant) at some point.Then, the attacker can also learn the leakage of L(x ) 3 .Take c = 00...01 as an example, we have: It is not hard to see that such leakage can be regarded as the leakage from the same intermediate state x, but with a different linear leakage function L .Thus, if the targeted implementation has some operand like x ⊕ c, the attacker may be able to manipulate c to get multiple "observations"for the intermediate state x.Assuming the attacker can get enough observations (the number of observations m ≥ n ), in theory, he (or she) can solve the intermediate state x as an noisy ICA problem.
In practice, considering side channel leakage usually contains high level of noise, the authors also proposed an specific ICA algorithm for SCA.Due to the space limit, we omit further details: interested readers can find this part in [1].
Unlike other traditional SCAs, recovering x with ICA does not involve any key guess.As a consequence, ICA-based SCA serves as a perfect tool for SCA in the middle rounds or SCA-based reverse engineering [1].Indeed, the authors already provide realistic experiments to verify their results on certain software implementations.On the other hand, as stated in [1], in many realistic circumstances, finding such XOR constant c might not be an easy task.For this reason, to date, the applications of ICA-based SCA are restricted to unprotected cryptographic implementations.

ICA-based Attack on a Table Re-computation Scheme
In the following, we analyse the potential application of ICA on a few masking schemes.Perhaps surprisingly, for some masking schemes, constructing multiple observations becomes much easier.The following two sections present two case studies: for each case study, we will review its mask computation, analyze its leakage and show how ICA-based SCA enables the recovery of the random masks.Comparison with previous attacks and experimental verifications are also provided in each case.We begin by studying a table re-computation scheme.

Table Re-computation Schemes
Considering the memory cost, masking schemes with global look-up table can hardly be applied to larger Sboxes (eg. the Sbox in AES).Thus, many masking schemes choose to generate the masked table online.In a d-shares table recomputation scheme, (x (1) ,x (2) ,...,x (d−1) ) is taken to the computation to create a masked table T .In the last step, the implementation simply looks up x (d)  in T and returns T (x (d) ) as the output shares.To ensure its security against SCA, designers may also add some other procedures, such as refreshing T with fresh randomness after each table look-up [9].Meanwhile, most masked table re-computations are rather similar: for clarity, we present a d-shares table recomputation procedure in Algorithm 1.

Previous Attacks
Note that in Algorithm 1, line 3 always produces 2 n leakages for each share.More specifically, assume the leakage function is L, the attacker learns the leakages of (L(x (i) ), L(x (i) ⊕ 1), ..., L(x (i) ⊕ (2 n − 1))).As all these leakages depend on the Algorithm 1 A d-shares table re-computation for an n-bit Sbox Input: x (1) ,...,x (d) = S(x) for all u ∈ {0, 1} n do 3: end for 5: T = T 6: end for same share x (i) , the attacker can take a guess about x (i) and verify this guess with Correlation Power Analysis (CPA) [16].Unlike traditional CPA which utilises a specific leakage point across many traces (i.e. a "vertical"attack), this attack utilizes all the 2 n leakages on the same trace (i.e. it is a "horizontal"attack).Having recovered the masks, key recovery is trivial: since all d − 1 input shares (random masks) are already known, a traditional vertical CPA on the leakage of x (d) reveals the secret key.Previous studies proved that, for 8-bit Sboxes (n = 8), such "horizontal" attack is a serious threat for table re-computation schemes [10].
A common countermeasure for the horizontal attacks is to randomly shuffle the constant u in line 3. Since the computation follows some random order (ϕ(0),ϕ(1),...,ϕ(2 n − 1)), x (i) alone can no longer determine all the 2 n leakages.However, for many smart card applications, generating and storing an n-bit random permutation ϕ in memory is far too expensive.Instead, they prefer to use some pseudo-random function ϕ that can be computed online.However, the computation of ϕ provides new leakages for the attacker.Tunstall et al. showed that the attacker can easily explore such leakages and recover the entire permutation ϕ [11].Moreover, Bruneau et al. proposed a multi-variate attack which combines all 2 n leakages on one trace into a statistic that depends on x (i) [5].As the combination is unordered, random shuffling does not affect the final statistic.Although x (i) cannot be recovered, the attacker finds the secret key through higher-order attacks, with the leakage of x (i) as well as this statistic.

ICA-based attack
Mask recovery The leakages that occur in table re-computation schemes are a perfect match for ICA.Specifically, each bit of the intermediate state x now becomes an independent binary source.Assuming the leakage function is linear, the attacker can always use the leakage of x as one observation for ICA.As stated previously, the leakage of L(x ⊕ c) can also be regarded as the leakages of x with a different leakage function.In other words, for table re-computation schemes, the attacker can always find 2 n independent observations through 2 n XOR constants.In fact, ICA only needs n observations for a successful recovery.Taking noise into consideration, the formal model can be written as: where N represents the random noise.As stated in Section 2.3, ICA-based SCA helps to recover the secret share x (i) .
Key recovery Since all d − 1 secret shares are already recovered, the following key recovery becomes trivial.Take the last round attack of AES for instance, assuming the corresponding ciphertext byte is c and related round key byte is k, we have Since the attacker has the leakage of x (d) , traditional CPA helps to determine the correct key guess for k, as long as the value of x (1) ⊕ x (2) ⊕ ... ⊕ x (d−1) is given.
Comparison with previous attacks Compared with horizontal CPAs, our ICAbased mask recovery uses only n leakage samples.Since horizontal CPA takes guesses about x (i) , it only applies to one certain trace.In other words, the sample size for horizontal CPA on table re-computation schemes is always 2 n .Previous studies showed that for n = 8, horizontal CPA works quite well with software implementations [10,11].However, for smaller Sboxes (eg.n = 4), horizontal CPA becomes less effective [11].This is not surprising though: as a non-profiled attack, CPA requires several traces to achieve a stable recovery.For our ICAbased mask recovery, smaller Sbox is hardly a problem.Since our approach uses only n leakage points, it works well even if n = 2.Meanwhile, the mask recovery in horizontal CPA is basically a one-dimensional attack: since each trace has different input shares (random masks), horizontal CPA only works on the horizontal axis.The following key recovery, on the other hand, only collects information on the vertical axis.In Bruneau et al.'s work [5], since the horizontal leakages are packed into one statistic, their attack mainly works on the vertical axis.On the contrary, our approach is essentially a two-dimensional attack.Both the multiple leakages on one trace ("horizonal") and the leakage model shared by all traces ("vertical") are taken into consideration.In some cases, this twodimensional property becomes a limitation: if the target implementation uses random shuffling as a countermeasure, the frequently changing random order ϕ completely defeats our attack.Since the 2 n horizontal leakages in our attack are not packed together (like Bruneau et al.'s attack), this random order prevents our attack to explore the vertical information.However, such protection only works if the designers use a new ϕ for each encryption.If the random ϕ is fixed, our attack works exactly the same way: as ICA does not require to know the mixing matrix, we can recover x (i) without knowing ϕ.For easy comparison, we list the attacks mentioned above with 2-shares table re-computation schemes in Table 1

Experimental Validation
To show that our ICA-based attack works, we have implemented a 2-shares version of Coron's masking scheme [9] on an IC card with 8-bit microprocessor (Atmega163).The power consumption was measured with a PicoScope 3206D oscilloscope at a sampling rate of 1 GSa/s.The target cipher uses the 4-bit Sbox of PRESENT [17].Since the previous studies already proved that horizontal CPA works well with 8-bit Sboxes, here we aim to test whether it still gives satisfying recovery with smaller Sboxes.Our entire trace set contains 200 traces, with 2 000 000 samples covering the Sbox computation in the last round.Results from both horizontal CPA and our ICA-based attack are presented in Figure 1.Clearly the small 4-bit Sbox is an issue for horizontal CPA: as there are only 16 leakage samples on each trace, mask recovery becomes less reliable.In our experiments, only 30% of the random masks are successfully recovered.As most recovered masks are incorrect, further key recovery becomes less effective.On the other hand, our ICA-based mask recovery finds over 90% of the random masks correctly with only 40 traces Figure 1 shows such attack is quite efficient: the key recovery becomes stable after only 20 traces.

ICA-based Attack on DPAContest v4.2
As table re-computation schemes produce the leakages of (x (i) , x (i) ⊕ 1, ..., x (i) ⊕ 2 n − 1), recovering the random masks with ICA seems quite straightforward.In the following, let us consider a more subtle example: the masking scheme in DPAContest v4.2.

The Rotating Sbox Masking Scheme
Unlike typical table re-computation schemes, the masking scheme in DPAContest v4 uses global look-up tables, where the masked tables are pre-computed offline.As stated previously, for larger Sboxes (like AES), storing all possible masked tables is impossible for many commonly used encryption devices.Instead, DPAContest v4 uses Rotating Sbox Masking (RSM) [13], which uses a set of constant masks rather than completely random masks.More specifically, in the latest version (DPAContest v4.2) [18], the implementation uses the following mask set: Before any encryption, 16 masked tables (MS i ) are pre-computed and stored in memory: In each encryption, the encryption device randomly pick a 16 elements offset array O[0 : 15], where each O[i] is a 4-bit random offset.According to the mask set, the initial 128 bit mask is At the end of one encryption round, each mask byte is "rotated" right for one position in the masking set.Thus, in the (r + 1)-th round, the input mask is: Algorithm 2 describes the masked round function of AES-128 in detail.In addition, considering the threat of higher-order SCA, random shuffling is applied to the first/last round.Since the Sbox computation order is not given, the attacker can hardly combine leakages from multiple traces and learn the secret key from conventional vertical SCA.

Previous Attacks
Although there are many attacks in the hall of fame of DPAContest V4.As we can see in line 5-6, in the table look-up procedure, the attacker finds the leakage of (x + r) mod 16.Although the first/last round Sbox computa-tion is shuffled, the rest 8 rounds in the middle still provide exploitable leakages.Specifically, the data-dependant leakages for round 2-9 can be written as {L ((x + 1) mod 16) , L ((x + 2) mod 16) , ..., L ((x + 8) mod 16)}.In this case, the attacker can guess x and verify his guess with horizontal CPA.Nonetheless, considering there are only 8 leakage samples available, recovering the random masks with horizontal CPA seems to be a difficult task.

ICA-based Attack
Apparently, applying ICA in this scheme is not as straightforward as table recomputation schemes.Following the previous construction, the random mask index x can be regarded as 4-bit binary sources.However, as the leakages here depend on (x + r) mod 16, the "XOR-constant "method [1] no longer provides multiple observations.Nonetheless, in round 9, we have As a result, the leakage of round 9 forms a valid ICA observation.Similarly, the Boolean function of y = (x + 4) mod 16 can be written as: Clearly, the least significant 3 bits have the same expressions as x ⊕ 4. The only difference lies in the most significant bit y 1 .Since ICA is a linear5 procedure, the linear mixture of x can never express x 1 ⊕ x 2 .As a consequence, in ICA, the leakage of y 1 can be regarded as random noise.More specifically, in round 5, In other words, the leakages in round 5 can be regarded as a noisier observation of x with an equivalent leakage function where α 1 = 0. Similar property holds for the leakages of (x + 2) mod 16 and (x + 1) mod 16, although the signal-to-noise-ratio (SNR) will be further reduced.As a result, attackers can recover the offset O[0] with the leakages from round (2,3,5,9).With the random masks recovered, the following key recovery becomes much easier.Unlike the Sbox, the MixColumn computations in the first round are not shuffled.Therefore, attackers can explore the leakages of MixColumn and learn the secret key through conventional vertical SCA.

Experimental Validation
We show how our ICA-based attack can be applied here with the EM traces provided by DPAContest [14].In our experiments, the leakage of offset O[0] appears not only in the Sbox computations, but also in the MixColumn computations.For better recovery, in each round, our ICA-based analysis takes both observations as its inputs.As a result, in the mask recovery stage, our analysis uses 8 observations to retrieve 4 sources.Even with these extra leakages, our mask recovery is not as good as the previous section.As we can see in Figure 2, the success rate for our ICA-based mask recovery is around 80%. Nonetheless, the following key-recovery proves that 80% accuracy is still good enough for key recovery: the correct key is almost determined after only 30 traces.On the other hand, in our experiment, 8 leakages can hardly support a horizontal CPA: only 10% of the recovered masks are correct and thus key recovery becomes infeasible.

Conclusion
In 2017, Gao et al. have proposed a novel side channel analysis based on independent component analysis (ICA) [1].As this ICA-based SCA does not take a "guess-and-determine" procedure, this approach is quite useful for attacking the middle rounds or reverse engineering.However, previous work only studied unprotected implementations.
In this paper, we demonstrated the potential of ICA to defeat some masking schemes: table re-computation and the RSM masking scheme in DPAContest V4.2.Our analysis shows that, assuming the attacker can choose the leakage samples wisely, the random masks in both schemes can be effectively recovered.Compared with the previous attacks, our mask recovery requires fewer leakages.For masking scheme designers, our attack is another warning: horizontal attacks are indeed serious practical threats.If the same (or relevant) mask appears multiple times during the computation, the attacker may learn considerable information about the mask, even if it never mixes with any masked intermediate state.

Table 1 .
Comparison of attacks with 2-shares table re-computation schemes 4.
[14]ewer participants give detailed descriptions of their attacks.As a result, we can only present a brief overview of the current results.Apparently, profiling attacks work well with DPAContest v4.2.Most profiling attack recovers the secret key with a few traces, whereas the best one works with only one trace.On the other hand, most non-profiled attacks use much more traces.To date, the best non-profiled attack existed is due to Zeyi Liu et al[14].According to the hall of fame, their attack takes only 14 traces, whereas all other non-profiled attacks need a few hundreds traces.In theory, horizontal CPA still works for this scheme.Denote the 4-bit O[0] as x, in each Sbox computation, the processor needs x to decide which masked table should be used.Algorithm 3 presents the assembly codes of the Sbox computation in DPAContest v4.2.