All papers in 2024 (Page 6 of 658 results)

Last updated:  2024-02-02
HiSE: Hierarchical (Threshold) Symmetric-key Encryption
Pousali Dey, Pratyay Mukherjee, Swagata Sasmal, and Rohit Sinha
Threshold symmetric encryption (TSE), introduced by Agrawal et al. [DiSE, CCS 2018], provides scalable and decentralized solution for symmetric encryption by ensuring that the secret-key stays distributed at all times. They avoid having a single point of attack or failure, while achieving the necessary security requirements. TSE was further improved by Christodorescu et al. [ATSE, CCS 2021] to support an amortization feature which enables a “more privileged” client to encrypt records in bulk by interacting only once with the key servers, while decryption must be performed individually for each record, potentially by a “less privileged” client. However, typical enterprises collect or generate data once and query it several times over its lifecycle in various data processing pipelines; i.e., enterprise workloads are often decryption heavy! ATSE does not meet the bar for this setting because of linear interaction / computation (in the number of records to be decrypted) – our experiments show that ATSE provides a sub-par throughput of a few hundred records / sec. We observe that a large class of queries read a subsequence of records (e.g. a time window) from the database. With this access structure in mind, we build a new TSE scheme which allows for both encryption and decryption with flexible granularity, in that a client’s interactions with the key servers is at most logarithmic in the number of records. Our idea is to employ a binary-tree access structure over the data, where only one interaction is needed to decrypt all ciphertexts within a sub-tree, and thus only log-many for any arbitrary size sub-sequence. Our scheme incorporates ideas from binary-tree encryption by Canetti et al. [Eurocrypt 2003] and its variants, and carefully merges that with Merkle-tree commitments to fit into the TSE setting. We formalize this notion as hierarchical threshold symmetric-key encryption (HiSE), and argue that our construction satisfies all essential TSE properties, such as correctness, privacy and authenticity with respect to our definition. Our analysis relies on a well-known XDH assumption and a new assumption, that we call $\ell$-masked BDDH, over asymmetric bilinear pairing in the programmable random oracle model. We also show that our new assumption does hold in generic group model. We provide an open-source implementation of HiSE. For practical parameters, we see 65$\times$ improvement in latency and throughput over ATSE. HiSE can decrypt over 6K records / sec on server-grade hardware, but the logarithmic overhead in HiSE’s encryption (not decryption) only lets us encrypt up to 3K records / sec (about 3-4.5$\times$ slowdown) and incurs roughly 500 bytes of ciphertext expansion per record – while reducing this penalty is an important future work, we believe HiSE can offer an acceptable tradeoff in practice.
Last updated:  2024-02-06
Delphi: sharing assessments of cryptographic assumptions
Jeroen van de Graaf and Arjen K. Lenstra
Almost all practical cryptographic protocols are based on computational or ad-hoc assumptions. Assessing the strengths of these assumptions is therefore a key factor in evaluating the risks of the systems using them. As a service to (and by) cryptographic researchers and practitioners, we developed Delphi, an online questionnaire to document researchers' opinions and beliefs about the strengths of the most important assumptions. All responses received will be made accessible on our website, ideally non-anonymous (but depending on the contributor's preference). We also intend to consolidate these responses and publish the results. We believe this effort will be of great value when deciding which cryptographic primitives to keep or start using.
Last updated:  2024-02-02
Homomorphic sign evaluation using functional bootstrapping with a RNS representation of integers
Philippe Chartier, Michel Koskas, Mohammed Lemou, and Florian Méhats
In the context of fully-homomorphic-encryption, we consider the representation of large integers by their decomposition over a product of rings (through the Chinese Remainder Theorem) and introduce a new algorithm for the determination of the sign solely through the knowledge of ring-components. We then prove that our algorithm delivers a correct result with a very high probability.
Last updated:  2024-02-02
Fully Homomorphic Encryption on large integers
Philippe Chartier, Michel Koskas, Mohammed Lemou, and Florian Méhats
At the core of fully homomorphic encryption lies a procedure to refresh the ciphertexts whose noise component has grown too big. The efficiency of the so-called bootstrap is of paramount importance as it is usually regarded as the main bottleneck towards a real-life deployment of fully homomorphic crypto-systems. In two of the fastest implementations so far, the space of messages is limited to binary integers. If the message space is extended to the discretized torus $T_{p_i}$ or equivalently to $Z_{p_i}$ with values of $p_i$ large as compared to the dimension of the quotient ring in which the operations are realised, the bootstrap delivers incorrect results with far too high probability. As a consequence, the use of a residue numeral system to address large integers modulo $p=p_1 \times \ldots \times p_\kappa$ would be of limited interest in practical situations without the following remedy: rather than increasing the polynomial degree and thus the computational cost, we introduce here a novel and simple technique (hereafter referred to as ``collapsing") which, by grouping the components of the mask, attenuates both rounding errors and computational costs, and greatly helps to sharpen the correctness of the bootstrap. We then rigorously estimate the probability of success as well as the output error and determine practical parameters to reach a given correctness threshold.
Last updated:  2024-02-02
Broadcast Encryption using Sum-Product decomposition of Boolean functions
Aurélien Dupin and Simon Abelard
The problem of Broadcast Encryption (BE) consists in broadcasting an encrypted message to a large number of users or receiving devices in such a way that the emitter of the message can control which of the users can or cannot decrypt it. Since the early 1990's, the design of BE schemes has received significant interest and many different concepts were proposed. A major breakthrough was achieved by Naor, Naor and Lotspiech (CRYPTO 2001) by partitioning cleverly the set of authorized users and associating a symmetric key to each subset. Since then, while there have been many advances in public-key based BE schemes, mostly based on bilinear maps, little was made on symmetric cryptography. In this paper, we design a new symmetric-based BE scheme, named $\Sigma\Pi$BE, that relies on logic optimization and consensual security assumptions. It is competitive with the work of Naor et al. and provides a different tradeoff: the bandwidth requirement is significantly lowered at the cost of an increase in the key storage.
Last updated:  2024-02-02
Revisiting the Slot-to-Coefficient Transformation for BGV and BFV
Robin Geelen
Numerous algorithms in homomorphic encryption require an operation that moves the slots of a ciphertext to the coefficients of a different ciphertext. We describe an FFT-like method for decomposing this slot-to-coefficient transformation (and its inverse) for BGV and BFV. The proposed method is specific to power-of-two cyclotomic rings and can handle both fully and sparsely packed slots. Previously, such a method was only known for non-power-of-two cyclotomic rings. Our algorithm admits more freedom in the complexity-depth trade-off than prior works. Moreover, it brings down the computational complexity of the slot-to-coefficient transformation from a linear to a logarithmic number of FHE operations in the best case, which is shown via a detailed complexity analysis. We also provide a proof-of-concept implementation in the Magma bootstrapping library.
Last updated:  2024-02-02
Equivalence of Generalised Feistel Networks
Patrick Derbez and Marie Euler
This paper focuses on equivalences between Generalised Feistel Networks (GFN) of type-II. We introduce a new definition of equivalence which captures the concept that two GFNs are identical up to re-labelling of the inputs/outputs, and give a procedure to test this equivalence relation. Such two GFNs are therefore cryptographically equivalent for several classes of attacks. It induces a reduction of the space of possible GFNs: the set of the $(k!)^2$ possible even-odd GFNs with $2k$ branches can be partitioned into $k!$ different classes. This result can be very useful when looking for an optimal GFN regarding specific computationally intensive properties, such as the minimal number of active S-boxes in a differential trail. We also show that in several previous papers, many GFN candidates are redundant as they belong to only a few classes. Because of this reduction of candidates, we are also able to suggest better permutations than the one of WARP: they reach 64 active S-boxes in one round less and still have the same diffusion round that WARP. Finally, we also point out a new family of permutations with good diffusion properties.
Last updated:  2024-02-02
Improving Linear Key Recovery Attacks using Walsh Spectrum Puncturing
Antonio Flórez-Gutiérrez and Yosuke Todo
In some linear key recovery attacks, the function which determines the value of the linear approximation from the plaintext, ciphertext and key is replaced by a similar map in order to improve the time or memory complexity at the cost of a data complexity increase. We propose a general framework for key recovery map substitution, and introduce Walsh spectrum puncturing, which consists of removing carefully-chosen coefficients from the Walsh spectrum of this map. The capabilities of this technique are illustrated by describing improved attacks on reduced-round Serpent (including the first 12-round attack on the 192-bit key variant), GIFT-128 and NOEKEON, as well as the full DES.
Last updated:  2024-02-02
SALSA FRESCA: Angular Embeddings and Pre-Training for ML Attacks on Learning With Errors
Samuel Stevens, Emily Wenger, Cathy Yuanchen Li, Niklas Nolte, Eshika Saxena, Francois Charton, and Kristin Lauter
Learning with Errors (LWE) is a hard math problem underlying post-quantum cryptography (PQC) systems for key exchange and digital signatures, recently standardized by NIST. Prior work [Wenger et al., 2022; Li et al., 2023a;b] proposed new machine learning (ML)-based attacks on LWE problems with small, sparse secrets, but these attacks require millions of LWE samples to train on and take days to recover secrets. We propose three key methods—better pre-processing, angular embeddings and model pre-training—to improve these attacks, speeding up preprocessing by 25× and improving model sample efficiency by 10×. We demonstrate for the first time that pre-training improves and reduces the cost of ML attacks on LWE. Our architecture improvements enable scaling to larger-dimension LWE problems: this work is the first instance of ML attacks recovering sparse binary secrets in dimension n = 1024, the smallest dimension used in practice for homomorphic encryption applications of LWE where sparse binary secrets are proposed.
Last updated:  2024-02-01
Evict+Spec+Time: Exploiting Out-of-Order Execution to Improve Cache-Timing Attacks
Shing Hing William Cheng, Chitchanok Chuengsatiansup, Daniel Genkin, Dallas McNeil, Toby Murray, Yuval Yarom, and Zhiyuan Zhang
Speculative out-of-order execution is a strategy of masking execution latency by allowing younger instructions to execute before older instructions. While originally considered to be innocuous, speculative out-of-order execution was brought into the spotlight with the 2018 publication of the Spectre and Meltdown attacks. These attacks demonstrated that microarchitectural side channels can leak sensitive data accessed by speculatively executed instructions that are not part of the normal program execution. Since then, a significant effort has been vested in investigating how microarchitectural side channels can leak data from speculatively executed instructions and how to control this leakage. However, much less is known about how speculative out-of-order execution affects microarchitectural side-channel attacks. In this paper, we investigate how speculative out-of-order execution affects the Evict+Time cache attack. Evict+Time is based on the observation that cache misses are slower than cache hits, hence by measuring the execution time of code, an attacker can determine if a cache miss occurred during the execution. We demonstrate that, due to limited resources for tracking out-of-order execution, under certain conditions an attacker can gain more fine-grained information and determine whether a cache miss occurred in part of the executed code. Based on the observation, we design the Evict+Spec+Time attack, a variant of Evict+Time that can learn not only whether a cache miss occurred, but also in which part of the victim code it occurred. We demonstrate that Evict+Spec+Time is an order of magnitude more efficient than Evict+Time when attacking a T-table-based implementation of AES. We further show an Evict+Spec+Time attack on an S-box-based implementation of AES, recovering the key with as little as 14389 decryptions. To the best of our knowledge, ours is the first successful Evict+Time attack on such a victim.
Last updated:  2024-02-01
Preliminary Cryptanalysis of the Biscuit Signature Scheme
Charles Bouillaguet and Julia Sauvage
Biscuit is a recent multivariate signature scheme based on the MPC-in-the-Head paradigm. It has been submitted to the NIST competition for additional signature schemes. Signatures are derived from a zero-knowledge proof of knowledge of the solution of a structured polynomial system. This extra structure enables efficient proofs and compact signatures. This short note demonstrates that it also makes these polynomial systems easier to solve than random ones. As a consequence, the original parameters of Biscuit failed to meet the required security levels and had to be upgraded.
Last updated:  2024-02-01
Prime Masking vs. Faults - Exponential Security Amplification against Selected Classes of Attacks
Thorben Moos, Sayandeep Saha, and François-Xavier Standaert
Fault injection attacks are a serious concern for cryptographic hardware. Adversaries may extract sensitive information from the faulty output that is produced by a cryptographic circuit after actively disturbing its computation. Alternatively, the information whether an output would have been faulty, even if it is withheld from being released, may be exploited. The former class of attacks, which requires the collection of faulty outputs, such as Differential Fault Analysis (DFA), then either exploits some knowledge about the position of the injected fault or about its value. The latter class of attacks, which can be applied without ever obtaining faulty outputs, such as Statistical Ineffective Fault Attacks (SIFA), then either exploits a dependency between the effectiveness of the fault injection and the value to be faulted (e.g., an LSB stuck-at-0 only affecting odd numbers), denoted as SIFA-1, or a conditional propagation of a faulted value based on a sensitive intermediate (e.g., multiplication of a faulted value by 0 prevents propagation), denoted as SIFA-2. The aptitude of additive masking schemes, which were designed to prevent side-channel analysis, to also thwart fault attacks is typically assumed to be limited. Common fault models, such as toggle/bit-flip, stuck-at-0 or stuck-at-1 survive the recombination of Boolean shares well enough for generic attacks to succeed. More precisely, injecting a fault into one or multiple Boolean shares often results in the same, or at least a predictable, error appearing in the sensitive variable after recombination. In this work, we show that additive masking in prime-order fields breaks such relationships, causing frequently exploited biases to decrease exponentially in the number of shares. As a result, prime masking offers surprisingly strong protection against generic statistical attacks, which require a dependency between the effectiveness of an injected fault and the secret variable that is manipulated, such as SIFA-1. Operation-dependent statistical attacks, such as SIFA-2 and Fault Template Attacks (FTA), may still be performed against certain prime-field structures, even if they are masked with many shares. Yet, we analyze the corresponding cases and are able to provide specific guidelines on how to avoid vulnerabilities either at the cipher design or implementation level by making informed decisions about the primes, non-linear mappings and masked gadgets used. Since prime-field masking appears to be one of the rare instances of affordable countermeasures that naturally provide sound protection against sidechannel analysis and certain fault injection attacks, we believe there is a strong incentive for developing new ciphers to leverage these advantages.
Last updated:  2024-03-01
Computing Orientations from the Endomorphism Ring of Supersingular Curves and Applications
Jonathan Komada Eriksen and Antonin Leroux
This work introduces several algorithms related to the computation of orientations in endomorphism rings of supersingular elliptic curves. This problem boils down to representing integers by ternary quadratic forms, and it is at the heart of several results regarding the security of oriented-curves in isogeny-based cryptography. Our main contribution is to show that there exists efficient algorithms that can solve this problem for quadratic orders of discriminant $n$ up to $O(p^{4/3})$. Our approach improves upon previous results by increasing this bound from $O(p)$ to $O(p^{4/3})$ and removing some heuristics. We introduce several variants of our new algorithm and provide a careful analysis of their asymptotic running time (without heuristic when it is possible). The best proven asymptotic complexity of one of our variant is $O(n^{3/4}/p)$ in average. The best heuristic variant has a complexity of $O(p^{1/3})$ for big enough $n$. We then introduce several results regarding the computation of ideals between oriented orders. The first application of this is a simplification of the known reduction from vectorization to computing the endomorphism ring, removing the assumption on the factorization of the discriminant. As a second application, we relate the problem of computing fixed-degree isogenies between supersingular curves to the problem of computing orientations in endomorphism rings, and we show that for a large range of degree $d$, our new algorithms improve on the state-of-the-art, and in important special cases, the range of degree $d$ for which there exist a polynomial-time algorithm is increased. In the most special case we consider, when both curves are oriented by a small degree endomorphism, we show heuristically that our techniques allow the computation of isogenies of any degree, assuming they exist.
Last updated:  2024-02-01
Practical Batch Proofs of Exponentiation
Charlotte Hoffmann, Pavel Hubáček, and Svetlana Ivanova
A Proof of Exponentiation (PoE) allows a prover to efficiently convince a verifier that $y=x^e$ in some group of unknown order. PoEs are the basis for practical constructions of Verifiable Delay Functions (VDFs), which, in turn, are important for various higher-level protocols in distributed computing. In applications such as distributed consensus, many PoEs are generated regularly, motivating protocols for secure aggregation of batches of statements into a few statements to improve the efficiency for both parties. Rotem (TCC 2021) recently presented two such generic batch PoEs. In this work, we introduce two batch PoEs that outperform both proposals of Rotem and we evaluate their practicality. First, we show that the two batch PoEs of Rotem can be combined to improve the overall efficiency by at least a factor of two. Second, we revisit the work of Bellare, Garay, and Rabin (EUROCRYPT 1998) on batch verification of digital signatures and show that, under the low order assumption, their bucket test can be securely adapted to the setting of groups of unknown order. The resulting batch PoE quickly outperforms the state of the art in the expected number of group multiplications with the growing number of instances, and it decreases the cost of batching by an order of magnitude already for hundreds of thousands of instances. Importantly, it is the first batch PoE that significantly decreases both the proof size and complexity of verification. Our experimental evaluations show that even a non-optimized implementation achieves such improvements, which would match the demands of real-life systems requiring large-scale PoE processing. Finally, even though our proof techniques are conceptually similar to Rotem, we give an improved analysis of the application of the low order assumption towards secure batching of PoE instances, resulting in a tight reduction, which is important when setting the security parameter in practice.
Last updated:  2024-02-01
Efficient (3,3)-isogenies on fast Kummer surfaces
Maria Corte-Real Santos, Craig Costello, and Benjamin Smith
We give an alternative derivation of (N,N)-isogenies between fast Kummer surfaces which complements existing works based on the theory of theta functions. We use this framework to produce explicit formulae for the case of N = 3, and show that the resulting algorithms are more efficient than all prior (3,3)-isogeny algorithms.
Last updated:  2024-02-01
Scalable Collaborative zk-SNARK: Fully Distributed Proof Generation and Malicious Security
Xuanming Liu, Zhelei Zhou, Yinghao Wang, Bingsheng Zhang, and Xiaohu Yang
The notion of collaborative zk-SNARK is introduced by Ozdemir and Boneh (USENIX 2022), which allows multiple parties to jointly create a zk-SNARK proof over distributed secrets (also known as the witness). This approach ensures the privacy of the witness, as no corrupted servers involved in the proof generation can learn anything about the honest servers' witness. Later, Garg et al. continued the study, focusing on how to achieve faster proof generation (USENIX 2023). However, their approach requires a powerful server that is responsible for the most resource-intensive computations and communications during the proof generation. This requirement results in a scalability bottleneck, making their protocols unable to handle large-scale circuits. In this work, we address this issue by lifting a zk-SNARK called Libra (Crypto 2019) to a collaborative zk-SNARK and achieve a fully distributed proof generation, where all servers take roughly the same portion of the total workload. Further, our protocol can be adapted to be secure against a malicious adversary by incorporating some verification mechanisms. With 128 consumer machines and a 4Gbps network, we successfully generate a proof for a data-parallel circuit containing $2^{23}$ gates in merely 2.5 seconds and take only 0.5 GB memory for each server. This represents a $19\times$ speed-up, compared to a local Libra prover. Our benchmark further indicates an impressive 877$\times$ improvement in running time and a 992$\times$ enhancement in communication compared to the implementation in previous work. Furthermore, our protocol is capable of handling larger circuits, making it scalable in practice.
Last updated:  2024-04-05
GradedDAG: An Asynchronous DAG-based BFT Consensus with Lower Latency
Xiaohai Dai, Zhaonan Zhang, Jiang Xiao, Jingtao Yue, Xia Xie, and Hai Jin
To enable parallel processing, the Directed Acyclic Graph (DAG) structure is introduced to the design of asynchronous Byzantine Fault Tolerant (BFT) consensus protocols, known as DAG-based BFT. Existing DAG-based BFT protocols operate in successive waves, with each wave containing three or four Reliable Broadcast (RBC) rounds to broadcast data, resulting in high latency due to the three communication steps required in each RBC. For instance, Tusk, a state-of-the-art DAG-based BFT protocol, has a good-case latency of 7 communication steps and an expected worst latency of 21 communication steps. To reduce latency, we propose GradedDAG, a new DAG-based BFT consensus protocol based on our adapted RBC called Graded RBC (GRBC) and the Consistent Broadcast (CBC), with each wave consisting of only one GRBC round and one CBC round. Through GRBC, a replica can deliver data with a grade of 1 or 2, and a non-faulty replica delivering the data with grade 2 can ensure that more than 2/3 of replicas have delivered the same data. Meanwhile, through CBC, data delivered by different non-faulty replicas must be identical. In each wave, a block in the GRBC round will be elected as the leader. If a leader block has been delivered with grade 2, it and all its ancestor blocks can be committed. GradedDAG offers a good-case latency of 4 communication steps and an expected worst latency of 7.5 communication steps, significantly lower than the state-of-theart. Experimental results demonstrate GradedDAG’s feasibility and efficiency.
Last updated:  2024-02-01
Secure Statistical Analysis on Multiple Datasets: Join and Group-By
Gilad Asharov, Koki Hamada, Dai Ikarashi, Ryo Kikuchi, Ariel Nof, Benny Pinkas, and Junichi Tomida
We implement a secure platform for statistical analysis over multiple organizations and multiple datasets. We provide a suite of protocols for different variants of JOIN and GROUP-BY operations. JOIN allows combining data from multiple datasets based on a common column. GROUP-BY allows aggregating rows that have the same values in a column or a set of columns, and then apply some aggregation summary on the rows (such as sum, count, median, etc.). Both operations are fundamental tools for relational databases. One example use case of our platform is in data marketing in which an analyst would join purchase histories and membership information, and then obtain statistics, such as "Which products were bought by people earning this much per annum?" Both JOIN and GROUP-BY involve many variants, and we design protocols for several common procedures. In particular, we propose a novel group-by-median protocol that has not been known so far. Our protocols rely on sorting protocols, and work in the honest majority setting and against malicious adversaries. To the best of our knowledge, this is the first implementation of JOIN and GROUP-BY protocols secure against a malicious adversary.
Last updated:  2024-02-01
Efficient ECDSA-based Adaptor Signature for Batched Atomic Swaps
Binbin Tu, Min Zhang, and Yu Chen
Adaptor signature is a novel cryptographic primitive which ties together the signature and the leakage of a secret value. It has become an important tool for solving the scalability and interoperability problems in the blockchain. Aumayr et al. (Asiacrypt 2021) recently provide the formalization of the adaptor signature and present a provably secure ECDSA-based adaptor signature, which requires zero-knowledge proof in the pre-signing phase to ensure the signer works correctly. However, the number of zero-knowledge proofs is linear with the number of participants. In this paper, we propose efficient ECDSA-based adaptor signature schemes and give security proofs based on ECDSA. In our schemes, the zero-knowledge proofs in the pre-signing phase can be generated in a batch and offline. Meanwhile, the online pre-signing algorithm is similar to the ECDSA signing algorithm and can enjoy the same efficiency as ECDSA. In particular, considering specific verification scenarios, such as (batched) atomic swaps, our schemes can reduce the number of zero-knowledge proofs in the pre-signing phase to one, independent of the number of participants. Last, we conduct an experimental evaluation, demonstrating that the performance of our ECDSA-based adaptor signature reduces online pre-signing time by about 60% compared with the state-of-the-art ECDSA-based adaptor signature.
Last updated:  2024-01-31
Efficient Arithmetic in Garbled Circuits
David Heath
Garbled Circuit (GC) techniques usually work with Boolean circuits. Despite intense interest, efficient arithmetic generalizations of GC were only known from heavy assumptions, such as LWE. We construct arithmetic garbled circuits from circular correlation robust hashes, the assumption underlying the celebrated Free XOR garbling technique. Let $\lambda$ denote a computational security parameter, and consider the integers $\mathbb{Z}_m$ for any $m \geq 2$. Let $\ell = \lceil \log_2 m \rceil$ be the bit length of $\mathbb{Z}_m$ values. We garble arithmetic circuits over $\mathbb{Z}_m$ where the garbling of each gate has size $O(\ell \cdot \lambda)$ bits. Constrast this with Boolean-circuit-based arithmetic, requiring $O(\ell^2\cdot \lambda)$ bits via the schoolbook multiplication algorithm, or $O(\ell^{1.585}\cdot \lambda)$ bits via Karatsuba's algorithm. Our arithmetic gates are compatible with Boolean operations and with Garbled RAM, allowing to garble complex programs of arithmetic values.
Last updated:  2024-01-31
Correction Fault Attacks on Randomized CRYSTALS-Dilithium
Elisabeth Krahmer, Peter Pessl, Georg Land, and Tim Güneysu
After NIST’s selection of Dilithium as the primary future standard for quantum-secure digital signatures, increased efforts to understand its implementation security properties are required to enable widespread adoption on embedded devices. Concretely, there are still many open questions regarding the susceptibility of Dilithium to fault attacks. This is especially the case for Dilithium’s randomized (or hedged) signing mode, which, likely due to devastating implementation attacks on the deterministic mode, was selected as the default by NIST. This work takes steps towards closing this gap by presenting two new key-recovery fault attacks on randomized/hedged Dilithium. Both attacks are based on the idea of correcting faulty signatures after signing. A successful correction yields the value of a secret intermediate that carries information on the key. After gathering many faulty signatures and corresponding correction values, it is possible to solve for the signing key via either simple linear algebra or lattice-reduction techniques. Our first attack extends a previously published attack based on an instruction-skipping fault to the randomized setting. Our second attack injects faults in the matrix A, which is part of the public key. As such, it is not sensitive to side-channel leakage and has, potentially for this reason, not seen prior analysis regarding faults. We show that for Dilithium2, the attacks allow key recovery with as little as 1024 and 512 faulty signatures, respectively, with each signature generated by injecting a single targeted fault. We also demonstrate how our attacks can be adapted to circumvent several popular fault countermeasures with a moderate increase in the computational runtime and the number of required faulty signatures. These results are verified using both simulated faults and clock glitches on an ARM-based microcontroller. The presented attacks demonstrate that also randomized Dilithium can be subject to diverse fault attacks, that certain countermeasures might be easily bypassed, and that potential fault targets reach beyond side-channel sensitive operations. Still, many further operations are likely also susceptible, implying the need for increased analysis efforts in the future.
Last updated:  2024-01-31
Sleepy Consensus in the Known Participation Model
Chenxu Wang, Sisi Duan, Minghui Xu, Feng Li, and Xiuzhen Cheng
We study sleepy consensus in the known participation model, where replicas are aware of the minimum number of awake honest replicas. Compared to prior works that almost all assume the unknown participation model, we provide a fine-grained treatment of sleepy consensus in the known participation model and show some interesting results. First, we present a synchronous atomic broadcast protocol with $5\Delta+2\delta$ expected latency and $2\Delta+2\delta$ best-case latency, where $\Delta$ is the bound on network delay and $\delta$ is the actual network delay. In contrast, the best-known result in the unknown participation model (MMR, CCS 2023) achieves $14\Delta$ latency, more than twice the latency of our protocol. Second, in the partially synchronous network (the value of $\Delta$ is unknown), we show that without changing the conventional $n \geq 3f+1$ assumption, one can only obtain a secure sleepy consensus by making the stable storage assumption (where replicas need to store intermediate consensus parameters in stable storage). Finally, still in the partially synchronous network but not assuming stable storage, we prove the bounds on $n \geq 3f+2s+1$ without the global awake time (GAT) assumption (all honest replicas become awake after GAT) and $n \geq 3f+s+1$ with the GAT assumption, where $s$ is the maximum number of honest replicas that may become asleep simultaneously. Using these bounds, we transform HotStuff (PODC 2019) into a sleepy consensus protocol via a timeoutQC mechanism and a low-cost recovery protocol.
Last updated:  2024-01-31
Secure Transformer Inference Made Non-interactive
Jiawen Zhang, Jian Liu, Xinpeng Yang, Yinghao Wang, Kejia Chen, Xiaoyang Hou, Kui Ren, and Xiaohu Yang
Secure transformer inference has emerged as a prominent research topic following the proliferation of ChatGPT. Existing solutions are typically interactive, involving substantial communication load and numerous interaction rounds between the client and the server. In this paper, we propose NEXUS the first non-interactive protocol for secure transformer inference, where the client is only required to submit an encrypted input and await the encrypted result from the server. Central to NEXUS are two innovative techniques: SIMD ciphertext compression/decompression, and SIMD slots folding. Consequently, our approach achieves a speedup of 2.8$\times$ and a remarkable bandwidth reduction of 368.6$\times$, compared to the state-of-the-art solution presented in S&P '24.
Last updated:  2024-01-31
A Closer Look at the Belief Propagation Algorithm in Side-Channel-Assisted Chosen-Ciphertext Attacks
Kexin Qiao, Siwei Sun, Zhaoyang Wang, Zehan Wu, Junjie Cheng, An Wang, and Liehuang Zhu
The implementation security of post-quantum cryptography (PQC) algorithms has emerged as a critical concern with the PQC standardization process reaching its end. In a side-channel-assisted chosen-ciphertext attack, the attacker builds linear inequalities on secret key components and uses the belief propagation (BP) algorithm to solve. The number of inequalities leverages the query complexity of the attack, so the fewer the better. In this paper, we use the PQC standard algorithm Kyber512 as a study case to construct bilateral inequalities on key variables with substantially narrower intervals using a side-channel-assisted oracle. The number of such inequalities required to recover the key with probability 1 utilizing the BP algorithm is reduced relative to previous unilateral inequalities. Furthermore, we introduce strategies aimed at further refining the interval of inequalities. Diving into the BP algorithm, we discover a measure metric named JSD-metric that can gauge the tightness of an inequality. We then develop a heuristic strategy and a machine learning-based strategy to utilize the JSD-metrics to contract boundaries of inequalities even with fewer inequalities given, thus improving the information carried by the system of linear inequalities. This contraction strategy is at the algorithmic level and has the potential to be employed in all attacks endeavoring to establish a system of inequalities concerning key variables.
Last updated:  2024-01-31
Byzantine Fault Tolerance with Non-Determinism, Revisited
Sisi Duan and Yue Huang
The conventional Byzantine fault tolerance (BFT) paradigm requires replicated state machines to execute deterministic operations only. In practice, numerous applications and scenarios, especially in the era of blockchains, contain various sources of non-determinism. Despite decades of research on BFT, we still lack an efficient and easy-to-deploy solution for BFT with non-determinism—BFT-ND, especially in the asynchronous setting. We revisit the problem of BFT-ND and provide a formal and asynchronous treatment of BFT-ND. In particular, we design and implement Block-ND that insightfully separates the task of agreeing on the order of transactions from the task of agreement on the state: Block-ND allows reusing existing BFT implementations; on top of BFT, we reduce the agreement on the state to multivalued Byzantine agreement (MBA), a somewhat neglected primitive by practical systems. Block-ND is completely asynchronous as long as the underlying BFT is asynchronous. We provide a new MBA construction significantly faster than existing MBA constructions. We instantiate Block-ND in both the partially synchronous setting (with PBFT, OSDI 1999) and the purely asynchronous setting (with PACE, CCS 2022). Via a 91-instance WAN deployment on Amazon EC2, we show that Block-ND has only marginal performance degradation compared to conventional BFT.
Last updated:  2024-01-30
Optimizing Implementations of Boolean Functions
Meltem Sonmez Turan
Symmetric cryptography primitives are constructed by iterative applications of linear and nonlinear layers. Constructing efficient circuits for these layers, even for the linear one, is challenging. In 1997, Paar proposed a heuristic to minimize the number of XORs (modulo 2 addition) necessary to implement linear layers. In this study, we slightly modify Paar’s heuristics to find implementations for nonlinear Boolean functions, in particular to homogeneous Boolean functions. Additionally, we show how this heuristic can be used to construct circuits for generic Boolean functions with small number of AND gates, by exploiting affine equivalence relations.
Last updated:  2024-01-30
SimpleFT: A Simple Byzantine Fault Tolerant Consensus
Rui Hao, Chenglong Yi, Weiqi Dai, and Zhaonan Zhang
Although having been popular for a long time, Byzantine Fault Tolerance (BFT) consensus under the partially-synchronous network is denounced to be inefficient or even infeasible in recent years, which calls for a more robust asynchronous consensus. On the other hand, almost all the existing asynchronous consensus are too complicated to understand and even suffer from the termination problem. Motivated by the above problems, we propose SimpleFT in this paper, which is a simple asynchronous consensus and is mainly inspired by the simplicity of the Bitcoin protocol. With a re-understanding of the Bitcoin protocol, we disassemble the life cycle of a block into three phases, namely proposal, dissemination, and confirmation. Corresponding to these phases, we devise or introduce the sortition algorithm, reliable broadcast algorithm, and quorum certificate mechanism in SimpleFT, respectively. To make full use of the network resources and improve the system throughput, we further introduce the layered architecture to SimpleFT, which enables multiple blocks to be confirmed at the same height. Comprehensive analysis is made to validate the correctness of SimpleFT and various experiments are conducted to demonstrate its efficient performance.
Last updated:  2024-01-30
Practical Post-Quantum Signatures for Privacy
Sven Argo, Tim Güneysu, Corentin Jeudy, Georg Land, Adeline Roux-Langlois, and Olivier Sanders
The transition to post-quantum cryptography has been an enormous challenge and effort for cryptographers over the last decade, with impressive results such as the future NIST standards. However, the latter has so far only considered central cryptographic mechanisms (signatures or KEM) and not more advanced ones, e.g., targeting privacy-preserving applications. Of particular interest is the family of solutions called blind signatures, group signatures and anonymous credentials, for which standards already exist, and which are deployed in billions of devices. Such a family does not have, at this stage, an efficient post-quantum counterpart although very recent works improved this state of affairs by offering two different alternatives: either one gets a system with rather large elements but a security proved under standard assumptions or one gets a more efficient system at the cost of ad-hoc interactive assumptions or weaker security models. Moreover, all these works have only considered size complexity without implementing the quite complex building blocks their systems are composed of. In other words, the practicality of such systems is still very hard to assess, which is a problem if one envisions a post-quantum transition for the corresponding systems/standards. In this work, we propose a construction of so-called signature with efficient protocols (SEP), which is the core of such privacy-preserving solutions. By revisiting the approach by Jeudy et al. (Crypto 2023) we manage to get the best of the two alternatives mentioned above, namely short sizes with no compromise on security. To demonstrate this, we plug our SEP in an anonymous credential system, achieving credentials of less than 80 KB. In parallel, we fully implemented our system, and in particular the complex zero-knowledge framework of Lyubashevsky et al. (Crypto'22), which has, to our knowledge, not be done so far. Our work thus not only improves the state-of-the-art on privacy-preserving solutions, but also significantly improves the understanding of efficiency and implications for deployment in real-world systems.
Last updated:  2024-01-30
HADES: Automated Hardware Design Exploration for Cryptographic Primitives
Fabian Buschkowski, Georg Land, Jan Richter-Brockmann, Pascal Sasdrich, and Tim Güneysu
While formal constructions for cryptographic schemes have steadily evolved and emerged over the past decades, the design and implementation of efficient and secure hardware instances is still a mostly manual, tedious, and intuition-driven process. With the increasing complexity of modern cryptography, e.g., Post-Quantum Cryptography (PQC) schemes, and consideration of physical implementation attacks, e.g., Side-Channel Analysis (SCA), the design space often grows exorbitantly without developers being able to weigh all design options. This immediately raises the necessity for tool-assisted Design Space Exploration (DSE) for efficient and secure cryptographic hardware. For this, we present the progressive HADES framework, offering a customizable, extendable, and streamlined DSE for efficient and secure cryptographic hardware accelerators. This tool exhaustively traverses the design space driven by security requirements, rapidly predicts user-defined performance metrics, e.g., area footprint or cycle-accurate latency, and instantiates the most suitable candidate in a synthesizable Hardware Description Language (HDL). We demonstrate the capabilities of our framework by applying our proof-of-concept implementation to a wide-range selection of state-of-the-art symmetric and PQC schemes, including the ChaCha20 stream cipher and the designated PQC standard Kyber, for which we provide the first set of arbitrary-order masked hardware implementations.
Last updated:  2024-01-29
Finite Key OTP Functionality: Ciphers That Hold Off Attackers Smarter Than Their Designers
Gideon Samid
The prevailing ciphers rely on the weak assumption that their attacker is not smarter than expected by their designers. The resultant crypto ecology favors the cryptographic powerhouses, and hinders cyber freedom, cyber privacy and cyber democracy. This weakness can be remedied by using the gold standard of cryptography -- One Time Pad, OTP. Alas, it comes with a prohibitive cost of a key as long as the message it encrypts. When the stakes are high enough users pay this high price because OTP is immunized against smarter and better equipped attackers. Claude Shannon has shown that this size imposition on the key is non-negotiable in the context he analyzed. Alas, changing the context, one could achieve OTP equivalence. Three simple changes are introduced: (i) make the size of the key an integral part of the secret, (ii) every finite message is encrypted with an arbitrary part of the key, (iii) allow for open-ended dilution of the contents-bearing bits of the ciphertext, with content-devoid bits, which don't confuse the intended recipient, but impose an open-ended cryptanalytic barrier before the attacker. A-priori a cryptanalyst is facing a set of messages each of them deemed plausible to be the one hidden in the ciphertext. If the ciphertext is Finite Key OTP compliant then membership in this set will not change after an exhaustive cryptanalytic processing of the ciphertext. This constitutes functional equivalence with OTP. OTP functionality with a shared finite key creates a path to digital freedom, digital privacy and digital democracy.
Last updated:  2024-01-29
Non-Binding (Designated Verifier) Signature
Ehsan Ebrahimi
We argue that there are some scenarios in which plausible deniability might be desired for a digital signature scheme. For instance, the non-repudiation property of conventional signature schemes is problematic in designing an Instant Messaging system (WPES 2004). In this paper, we formally define a non-binding signature scheme in which the Signer is able to disavow her own signature if she wants, but, the Verifier is not able to dispute a signature generated by the Signer. That is, the Signer is able to convince a third party Judge that she is the owner of a signature without disclosing her secret information. We propose a signature scheme that is non-binding and unforgeable. Our signature scheme is post-quantum secure if the underlying cryptographic primitives are post-quantum secure. In addition, a modification to our nonbinding signature scheme leads to an Instant Messaging system that is of independent interest.
Last updated:  2024-02-08
Attacks Against the INDCPA-D Security of Exact FHE Schemes
Jung Hee Cheon, Hyeongmin Choe, Alain Passelègue, Damien Stehlé, and Elias Suvanto
A new security model for fully homomorphic encryption (FHE), called INDCPA-D security and introduced by Li and Micciancio [Eurocrypt'21], strengthens INDCPA security by giving the attacker access to a decryption oracle for ciphertexts for which it should know the underlying plaintexts. This includes ciphertexts that it (honestly) encrypted and those obtained from the latter by evaluating circuits that it chose. Li and Micciancio singled out the CKKS FHE scheme for approximate data [Asiacrypt'17] by giving an INDCPA-D attack on it and (erroneously) claiming that INDCPA-D security and INDCPA security coincide for FHEs on exact data. We correct the widespread belief according to which INDCPA-D attacks are specific to approximate homomorphic computations. Indeed, the  equivalency formally proved by Li and Micciancio assumes that the schemes are not only exact but have a negligible probability of incorrect decryption. However, almost all competitive implementations of exact FHE schemes give away strong correctness by analyzing correctness heuristically and allowing noticeable probabilities of incorrect decryption.  We exploit this imperfect correctness  to mount efficient indistinguishability and key-recovery attacks against all major exact FHE schemes.  We illustrate their strength by concretely breaking the default BFV implementation of OpenFHE and simulating an attack for the default parameter set of the CGGI implementation of TFHE-rs (the attack is too expensive to be run on commodity desktops, because of the cost of CGGI bootstrapping). Our attacks extend to threshold versions of the exact FHE schemes, when the correctness is similarly loose.
Last updated:  2024-01-29
Monte Carlo Tree Search for automatic differential characteristics search: application to SPECK
Emanuele Bellini, David Gerault, Matteo Protopapa, and Matteo Rossi
The search for differential characteristics on block ciphers is a difficult combinatorial problem. In this paper, we investigate the performances of an AI-originated technique, Single Player Monte-Carlo Tree Search (SP-MCTS), in finding good differential characteristics on ARX ciphers, with an application to the block cipher SPECK. In order to make this approach competitive, we include several heuristics, such as the combination of forward and backward searches, and achieve significantly faster results than state-of-the-art works that are not based on automatic solvers. We reach 9, 11, 13, 13 and 15 rounds for SPECK32, SPECK48, SPECK64, SPECK96 and SPECK128 respectively. In order to build our algorithm, we revisit Lipmaa and Moriai's algorithm for listing all optimal differential transitions through modular addition, and propose a variant to enumerate all transitions with probability close (up to a fixed threshold) to the optimal, while fixing a minor bug in the original algorithm.
Last updated:  2024-01-29
New self-orthogonal codes from weakly regular plateaued functions and their application in LCD codes
Melike Çakmak, Ahmet Sınak, and Oğuz Yayla
A linear code is considered self-orthogonal if it is contained within its dual code. Self-orthogonal codes have applications in linear complementary dual codes, quantum codes and so on. The construction of self-orthogonal codes from functions over finite fields has been studied in the literature. In this paper, we generalize the construction method given by Heng et al. (2023) to weakly regular plateaued functions. We first construct several families of ternary self-orthogonal codes from weakly regular plateaued unbalanced functions. Then we use the self-orthogonal codes to construct new families of ternary LCD codes. As a consequence, we obtain (almost) optimal ternary self-orthogonal codes and LCD codes. Moreover, we propose two new classes of $p$-ary linear codes from weakly regular plateaued unbalanced functions over the finite fields of odd characteristics.
Last updated:  2024-01-29
Perceived Information Revisited II: Information-Theoretical Analysis of Deep-Learning Based Side-Channel Attacks
Akira Ito, Rei Ueno, and Naofumi Homma
In conventional deep-learning-based side-channel attacks (DL-SCAs), an attacker trains a model by updating parameters to minimize the negative log-likelihood (NLL) loss function. Although a decrease in NLL improves DL-SCA performance, the reasons for this improvement remain unclear because of the lack of a formal analysis. To address this open problem, this paper explores the relationship between NLL and the attack success rate (SR) and conducts an information-theoretical analysis of DL-SCAs with an NLL loss function to solve open problems in DL-SCA. To this end, we introduce a communication channel for DL-SCAs and derive an inequality that links model outputs to the SR. Our inequality states that mutual information between the model output and intermediate value, which is named the latent perceived information (LPI), provides an upper bound of the SR of a DL-SCA with a trained neural network. Subsequently, we examine the conjecture by Ito et al. on the relationship between the effective perceived information (EPI) and SR and clarify its valid conditions from the perspective of LPI. Our analysis results reveal that a decrease in NLL correlated with an increase in LPI, which indicates that the model capability to extract intermediate value information from traces is enhanced. In addition, the results indicate that the LPI bounds the SR from above, and a higher upper bound of the SR could directly improve the SR if the selection function satisfies certain conditions, such as bijectivity. Finally, these theoretical insights are validated through attack experiments on neural network models applied to AES software and hardware implementations with masking countermeasures.
Last updated:  2024-01-27
Memory Checking Requires Logarithmic Overhead
Elette Boyle, Ilan Komargodski, and Neekon Vafa
We study the complexity of memory checkers with computational security and prove the first general tight lower bound. Memory checkers, first introduced over 30 years ago by Blum, Evans, Gemmel, Kannan, and Naor (FOCS '91, Algorithmica '94), allow a user to store and maintain a large memory on a remote and unreliable server by using small trusted local storage. The user can issue instructions to the server and after every instruction, obtain either the correct value or a failure (but not an incorrect answer) with high probability. The main complexity measure of interest is the size of the local storage and the number of queries the memory checker makes upon every logical instruction. The most efficient known construction has query complexity $O(\log n/\log\log n)$ and local space proportional to a computational security parameter, assuming one-way functions, where $n$ is the logical memory size. Dwork, Naor, Rothblum, and Vaikuntanathan (TCC '09) showed that for a restricted class of ``deterministic and non-adaptive'' memory checkers, this construction is optimal, up to constant factors. However, going beyond the small class of deterministic and non-adaptive constructions has remained a major open problem. In this work, we fully resolve the complexity of memory checkers by showing that any construction with local space $p$ and query complexity $q$ must satisfy $$ p \ge \frac{n}{(\log n)^{O(q)}} \;. $$ This implies, as a special case, that $q\ge \Omega(\log n/\log\log n)$ in any scheme, assuming that $p\le n^{1-\varepsilon}$ for $\varepsilon>0$. The bound applies to any scheme with computational security, completeness $2/3$, and inverse polynomial in $n$ soundness (all of which make our lower bound only stronger). We further extend the lower bound to schemes where the read complexity $q_r$ and write complexity $q_w$ differ. For instance, we show the tight bound that if $q_r=O(1)$ and $p\le n^{1-\varepsilon}$ for $\varepsilon>0$, then $q_w\ge n^{\Omega(1)}$. This is the first lower bound, for any non-trivial class of constructions, showing a read-write query complexity trade-off. Our proof is via a delicate compression argument showing that a ``too good to be true'' memory checker can be used to compress random bits of information. We draw inspiration from tools recently developed for lower bounds for relaxed locally decodable codes. However, our proof itself significantly departs from these works, necessitated by the differences between settings.
Last updated:  2024-01-27
SPRITE: Secure and Private Routing in Payment Channel Networks
Gaurav Panwar, Roopa Vishwanathan, George Torres, and Satyajayant Misra
Payment channel networks are a promising solution to the scalability challenge of blockchains and are designed for significantly increased transaction throughput compared to the layer one blockchain. Since payment channel networks are essentially decentralized peer-to-peer networks, routing transactions is a fundamental challenge. Payment channel networks have some unique security and privacy requirements that make pathfinding challenging, for instance, network topology is not publicly known, and sender/receiver privacy should be preserved, in addition to providing atomicity guarantees for payments. In this paper, we present an efficient privacy-preserving routing protocol, SPRITE, for payment channel networks that supports concurrent transactions. By finding paths offline and processing transactions online, SPRITE can process transactions in just two rounds, which is more efficient compared to prior work. We evaluate SPRITE’s performance using Lightning Net- work data and prove its security using the Universal Composability framework. In contrast to the current cutting-edge methods that achieve rapid transactions, our approach significantly reduces the message complexity of the system by 3 orders of magnitude while maintaining similar latencies.
Last updated:  2024-01-27
An acceleration of the AKS prime identification algorithm
Stephen Meredith Williams
In its standard form, the AKS prime identification algorithm is deterministic and polynomial time but too slow to be of practical use. By dropping its deterministic attribute, it can be accelerated to an extent that it is practically useful, though still much slower than the widely used Miller-Rabin-Selfridge-Monier (MRSM) algorithm based on the Fermat Little Theorem or the Solovay-Strassen algorithm based on the Euler Criterion. The change made, in the last stage of AKS, is to check a modular equation not for a long sequence of values but for a single one. Another change is to reduce, arbitrarily, the size of the parameter $r$ giving the degree of the polynomial used for the last stage.
Last updated:  2024-01-29
K-Waay: Fast and Deniable Post-Quantum X3DH without Ring Signatures
Daniel Collins, Loïs Huguenin-Dumittan, Ngoc Khanh Nguyen, Nicolas Rolin, and Serge Vaudenay
The Signal protocol and its X3DH key exchange core are regularly used by billions of people in applications like WhatsApp but are unfortunately not quantum-secure. Thus, designing an efficient and post-quantum secure X3DH alternative is paramount. Notably, X3DH supports asynchronicity, as parties can immediately derive keys after uploading them to a central server, and deniability, allowing parties to plausibly deny having completed key exchange. To satisfy these constraints, existing post-quantum X3DH proposals use ring signatures (or equivalently a form of designated-verifier signatures) to provide authentication without compromising deniability as regular signatures would. Existing ring signature schemes, however, have some drawbacks. Notably, they are not generally proven secure in the quantum random oracle model (QROM) and so the quantum security of parameters that are proposed is unclear and likely weaker than claimed. In addition, they are generally slower than standard primitives like KEMs. In this work, we propose an efficient, deniable and post-quantum X3DH-like protocol that we call K-Waay, that does not rely on ring signatures. At its core, K-Waay uses a split-KEM, a primitive introduced by Brendel et al. [SAC 2020], to provide Diffie-Hellman-like implicit authentication and secrecy guarantees. Along the way, we revisit the formalism of Brendel et al. and identify that additional security properties are required to prove a split-KEM-based protocol secure. We instantiate split-KEM by building a protocol based on the Frodo key exchange protocol relying on the plain LWE assumption: our proofs might be of independent interest as we show it satisfies our novel unforgeability and deniability security notions. Finally, we complement our theoretical results by thoroughly benchmarking both K-Waay and existing X3DH protocols. Our results show even when using plain LWE and a conservative choice of parameters that K-Waay is significantly faster than previous work.
Last updated:  2024-01-27
R3PO: Reach-Restricted Reactive Program Obfuscation and its Application to MA-ABE
Kaartik Bhushan, Sai Lakshmi Bhavana Obbattu, Manoj Prabhakaran, and Rajeev Raghunath
In recent breakthrough results, novel use of garbled circuits yielded constructions for several primitives like Identity-Based Encryption (IBE) and 2-round secure multi-party computation, based on standard assumptions in public-key cryptography. While the techniques in these different results have many common elements, these works did not offer a modular abstraction that could be used across them. Our main contribution is to introduce a novel notion of obfuscation, called Reach-Restricted Reactive Program Obfuscation (R3PO) that captures the essence of these constructions, and exposes additional capabilities. We provide a powerful composition theorem whose proof fully encapsulates the use of garbled circuits in these works. As an illustration of the potential of R3PO, and as an important contribution of independent interest, we present a variant of Multi-Authority Attribute-Based Encryption (MA-ABE) that can be based on (single-authority) CP-ABE in a blackbox manner, using only standard cryptographic assumptions (e.g., DDH). This is in stark contrast to the existing constructions for MA-ABE, which rely on the random oracle model and/or support only limited policy classes.
Last updated:  2024-01-26
Data Privacy Made Easy: Enhancing Applications with Homomorphic Encryption
Charles Gouert and Nektarios Georgios Tsoutsos
Homomorphic encryption is a powerful privacy-preserving technology that is notoriously difficult to configure and use, even for experts. The key difficulties include restrictive programming models of homomorphic schemes and choosing suitable parameters for an application. In this tutorial, we outline methodologies to solve these issues and allow for conversion of any application to the encrypted domain using both leveled and fully homomorphic encryption. The first approach, called Walrus, is suitable for arithmetic-intensive applications with limited depth and applications with high throughput requirements. Walrus provides an intuitive programming interface and handles parameterization automatically by analyzing the application and gathering statistics such as homomorphic noise growth to derive a parameter set tuned specifically for the application. We provide an in-depth example of this approach in the form of a neural network inference as well as guidelines for using Walrus effectively. Conversely, the second approach (HELM) takes existing HDL designs and converts them to the encrypted domain for secure outsourcing on powerful cloud servers. Unlike Walrus, HELM supports FHE backends and is well-suited for complex applications. At a high level, HELM consumes netlists and is capable of performing logic gate operations homomorphically on encryptions of individual bits. HELM incorporates both CPU and GPU acceleration by taking advantage of the inherent parallelism provided by Boolean circuits. As a case study, we walk through the process of taking an off-the-shelf HDL design in the form of AES-128 decryption and running it in the encrypted domain with HELM.
Last updated:  2024-01-26
Breaking HWQCS: a code-based signature scheme from high weight QC-LDPC codes
Alex Pellegrini and Giovanni Tognolini
We analyse HWQCS, a code based signature scheme presented at ICISC 2023, which uses quasi-cyclic low density parity check codes (QC-LDPC). The scheme introduces high Hamming weight errors and signs each message using a fresh ephemeral secret key rather than using only one secret key, so to avoid known attacks on QC-LDPC signature schemes. In this paper, we show that the signatures of HWQCS leak substantial information concerning the ephemeral keys and formally describe this behaviour. Furthermore, we show that for each security level, we can exploit the leakage to efficiently reconstruct partial secret data from very few signatures, and finally mount a universal forgery attack.
Last updated:  2024-02-17
On the practical CPAD security of “exact” and threshold FHE schemes and libraries
Marina Checri, Renaud Sirdey, Aymen Boudguiga, and Jean-Paul Bultel
In their 2021 seminal paper, Li and Micciancio presented a passive attack against the CKKS approximate FHE scheme and introduced the notion of CPAD security. The current status quo is that this line of attacks does not apply to ``exact'' FHE. In this paper, we challenge this status quo by exhibiting a CPAD key recovery attack on the linearly homomorphic Regev cryptosystem which easily generalizes to other xHE schemes such as BFV, BGV and TFHE showing that these cryptosystems are not CPAD secure in their basic form. We also show that existing threshold variants of BFV, BGV and CKKS are particularily exposed to CPAD attackers and would be CPAD-insecure without smudging noise addition after partial decryption. Finally we successfully implement our attack against several mainstream FHE libraries and discuss a number of natural countermeasures as well as their consequences in terms of FHE practice, security and efficiency. The attack itself is quite practical as it typically takes less than an hour on an average laptop PC, requiring a few thousand ciphertexts as well as up to around a million evaluations/decryptions, to perform a full key recovery.
Last updated:  2024-03-27
Accelerating BGV Bootstrapping for Large $p$ Using Null Polynomials Over $\mathbb{Z}_{p^e}$
Shihe Ma, Tairong Huang, Anyu Wang, and Xiaoyun Wang
The BGV scheme is one of the most popular FHE schemes for computing homomorphic integer arithmetic. The bootstrapping technique of BGV is necessary to evaluate arbitrarily deep circuits homomorphically. However, the BGV bootstrapping performs poorly for large plaintext prime $p$ due to its digit removal procedure exhibiting a computational complexity of at least $O(\sqrt{p})$. In this paper, we propose optimizations for the digit removal procedure with large $p$ by leveraging the properties of null polynomials over the ring $\mathbb{Z}_{p^e}$. Specifically, we demonstrate that it is possible to construct low-degree null polynomials based on two observations of the input to the digit removal procedure: 1) the support size of the input can be upper-bounded by $(2B+1)^2$; 2) the size of the lower digits to be removed can be upper-bounded by $B$. Here $B$ can be controlled within a narrow interval $[22,23]$ in our parameter selection, making the degree of these null polynomials much smaller than $p$ for large values of $p$. These low-degree null polynomials can significantly reduce the polynomial degrees during homomorphic digit removal, thereby decreasing both running time and capacity consumption. Theoretically, our optimizations reduce the computational cost of extracting a single digit from $O(\sqrt{pe})$ (by Chen and Han) or $O(\sqrt{p}\sqrt[4]{e})$ (by Geelen et al.) to $\min(2B+1,\sqrt{\lceil e/t\rceil(2B+1)})$ for some $t\ge 1$. We implement and benchmark our method on HElib with $p=17,127,257,8191$ and $65537$. With our optimized digit removal, we achieve a bootstrapping throughput $1.38\sim151$ times that in HElib, with the speedup increasing with the value of $p$. For $p=65537$, we accelerate the digit removal step by 80 times and reduce the bootstrapping time from more than 12 hours to less than 14 minutes.
Last updated:  2024-01-26
Mask Conversions for d+1 shares in Hardware, with Application to Lattice-based PQC
Quinten Norga, Jan-Pieter D'Anvers, Suparna Kundu, and Ingrid Verbauwhede
The conversion between arithmetic and Boolean mask representations (A2B & B2A) is a crucial component for side-channel resistant implementations of lattice-based cryptography. In this paper, we present a first- and high-order masked, unified hardware implementation which can perform both A2B & B2A conversions. We optimize the operation on several layers of abstraction, applicable to any protection order. First, we propose novel higher-order algorithms for the secure addition and B2A operation. This is achieved through, among others, an improved method for repeated masked modular reduction and through the X2B operation, which can be viewed as a conversion from any type of additive masking to its Boolean representation. This allows for the removal of a full secure addition during B2A post-processing. Compared to prior work, our $B2A_q$ requires 51/46/45 % less fresh randomness at first through third protection order when implemented in software or hardware. Secondly, on the circuit level, we successfully introduce half-cycle data paths and demonstrate how careful, manual masking is a superior approach for masking highly non-linear operations and providing first- and high-order security. Our techniques significantly reduce the high latency and fresh randomness overhead, typically introduced by glitch-resistant masking schemes and universally composable gadgets, including HPC3 by Knichel et al. presented at CCS 2022. Compared to state-of-the-art algorithms and masking techniques, our unified and high-throughput hardware implementation requires up to 89/84/86 % fewer clock cycles and 78/71/55 % fewer fresh random bits. We show detailed performance results for first-, second- and third-order protected implementations on FPGA. Our proposed algorithms are proven secure in the glitch extended probing model and their implementations are validated via practical lab analysis using the TVLA methodology. We experimentally show that both our first- and second-order masked implementation is hardened against univariate and multivariate attacks using 100 million traces, for each mode of operation.
Last updated:  2024-01-26
Improved Linear Key Recovery Attacks on PRESENT
Wenhui Wu, Muzhou Li, and Meiqin Wang
PRESENT is an ultra-lightweight block cipher designed by Bogdanov et al., and has been widely studied since its proposal. It supports 80-bit and 128-bit keys, which are referred as PRESENT-80 and PRESENT-128, respectively. Up to now, linear cryptanalysis is the most effective method on attacking this cipher, especially when accelerated with the pruned Walsh transform. Combing pruned Walsh transform with multiple linear attacks, one can recover the right key for 28-round PRESENT-80 and -128. Later, this method is further improved with affine pruned Walsh transform by adding more zeros in the Walsh spectrum through rejecting some data. This leads to the 29-round attack on PRESENT-128 with full codebook. In this paper, we follow the affine pruned Walsh transform accelerated linear method, and propose 29-round attacks on both PRESENT-80 and PRESENT-128 without using full codebook. Both attacks rely on a statistical model depicting distributions of the experimental correlation when some data are artificially rejected in its computation. Besides, detailed analysis of complexity reduction for each linear hull used in attacking PRESENT is also provided and supported by an automatic tool. Our 29-round attack on PRESENT-80 mainly benefits from this tool. According to our knowledge, both attacks are the best ones on PRESENT so far.
Last updated:  2024-01-25
pqm4: Benchmarking NIST Additional Post-Quantum Signature Schemes on Microcontrollers
Matthias J. Kannwischer, Markus Krausz, Richard Petri, and Shang-Yi Yang
In July 2022, the US National Institute for Standards and Technology (NIST) announced the first set of Post-Quantum Cryptography standards: Kyber, Dilithium, Falcon, and SPHINCS+. Shortly after, NIST published a call for proposals for additional post-quantum signature schemes to complement their initial portfolio. In 2023, 50 submissions were received, and 40 were accepted as round-1 candidates for future standardization. In this paper, we study the suitability and performance of said candidates on the popular Arm Cortex-M4microcontroller. We integrate the suitable implementations into the benchmarking framework pqm4 and provide benchmarking results on the STM32L4R5ZI featuring 640 KB of RAM. pqm4 currently includes reference implementations for 15 submissions and M4-optimized implementations for five submissions. For the remaining candidates, we describe the reasons that hinder integration - the predominant reason being large key size or excessive memory consumption. While the performance of reference implementations is rather meaningless and often does not correlate with the performance of well-optimized implementations, this work provides some first indication of which schemes are most promising on microcontrollers. The publicly available implementations in pqm4 also provide a good starting point for future optimization efforts. Initially, we were hoping for a much higher code quality than for initial submissions to NIST's previous PQC project. However, we got grossly disappointed: Half of the submissions make use of dynamic memory allocations, often completely without reason; Many implementations have compiler warnings, sometimes hinting at more serious issues; Many implementations do not pass simple sanitizer tests such as using valgrind; Multiple implementations make use of static memory.
Last updated:  2024-01-25
A Novel Power Analysis Attack against CRYSTALS-Dilithium Implementation
Yong Liu, Yuejun Liu, Yongbin Zhou, Yiwen Gao, Zehua Qiao, and Huaxin Wang
Post-Quantum Cryptography (PQC) was proposed due to the potential threats quantum computer attacks against conventional public key cryptosystems, and four PQC algorithms besides CRYSTALS-Dilithium (Dilithium for short) have so far been selected for NIST standardization. However, the selected algorithms are still vulnerable to side-channel attacks in practice, and their physical security need to be further evaluated. This study introduces two efficient power analysis attacks, the optimized fast two-stage approach and the single-bit approach, aimed at reducing the key guess space in NTT polynomial multiplication on an STM32F405 device (ARM Cortex-M4 core). Our findings reveal that the optimized approach outperforms the conservative approach and the fast two-stage approach proposed in ICCD 2021 by factors of 519 and 88, respectively. Similarly, the single-bit approach demonstrates speedups of 365 and 62 times compared to these two approaches, respectively.
Last updated:  2024-01-25
Cryptanalysis of the SNOVA signature scheme
Peigen Li and Jintai Ding
SNOVA is a variant of a UOV-type signature scheme over a noncommutative ring. In this article, we demonstrate that certain parameters provided by authors in SNOVA fail to meet the NIST security level, and the complexities are lower than those claimed by SNOVA.
Last updated:  2024-01-26
Simpler and Faster BFV Bootstrapping for Arbitrary Plaintext Modulus from CKKS
Jaehyung Kim, Jinyeong Seo, and Yongsoo Song
Bootstrapping is a key operation in fully homomorphic encryption schemes that enables the evaluation of arbitrary multiplicative depth circuits. In the BFV scheme, bootstrapping corresponds to reducing the size of accumulated noise in lower bits while preserving the plaintext in the upper bits. The previous instantiation of BFV bootstrapping is achieved through the digit extraction procedure. However, its performance is highly dependent on the plaintext modulus, so only a limited form of the plaintext modulus, a power of a small prime number, was used for the efficiency of bootstrapping. In this paper, we present a novel approach to instantiate BFV bootstrapping, distinct from the previous digit extraction-based method. The core idea of our bootstrapping is to utilize CKKS bootstrapping as a subroutine, so the performance of our method mainly depends on the underlying CKKS bootstrapping rather than the plaintext modulus. We implement our method at a proof-of-concept level to provide concrete benchmark results. When performing the bootstrapping operation for a 51-bits plaintext modulus, our method improves the previous digit extraction-based method by a factor of 37.9 in latency and 29.4 in throughput. Additionally, we achieve viable bootstrapping performance for large plaintext moduli, such as 144-bits and 234-bits, which has never been measured before.
Last updated:  2024-01-24
Some Improvements for the PIOP for ZeroCheck
Angus Gruen
Most multivariate proof systems require, at some point, an algebraic check against the rows of the trace. One popular protocol for this is known as zerocheck which is a sumcheck based protocol which proves a constraint function is zero over the $n$-dimensional Boolean hypercube. One of the drawbacks of running zerocheck over a small field, is that it usually involves a large number of evaluations of the constraint polynomial over a cryptographically large extension field $\mathbb{G}$. We describe several improvements to the zerocheck protocol over a small field $\mathbb{F}$ which both reduce the number of constraint evaluations the prover needs to perform and shifts some computation from the extension field back into the base field $\mathbb{F}$. Specifically, for a table of length $2^n$, integer parameter $1\leq k \leq n$ and constraint function $C$ of degree $d$ with evaluation costs $C_{\mathbb{F}}, C_{\mathbb{G}}$ we show the protocol can be performed with prover cost roughly \[ 2^n\left(1 + \frac{C_{\mathbb{G}}}{2^k C_{\mathbb{F}}}\right)(d - 1)C_{\mathbb{F}}. \] To check the proof, the verifier needs to perform a single interpolation of degree $2^kd$ and $n$ interpolations of degree $d$. Thus varying $k$ allows for a tradeoff between prover and verifier cost.
Last updated:  2024-01-24
ELEKTRA: Efficient Lightweight multi-dEvice Key TRAnsparency
Julia Len, Melissa Chase, Esha Ghosh, Daniel Jost, Balachandar Kesavan, and Antonio Marcedone
Key Transparency (KT) systems enable service providers of end-to-end encrypted communication (E2EE) platforms to maintain a Verifiable Key Directory (VKD) that maps each user's identifier, such as a username or email address, to their identity public key(s). Users periodically monitor the directory to ensure their own identifier maps to the correct keys, thus detecting any attempt to register a fake key on their behalf to Meddler-in-the-Middle (MitM) their communications. We introduce and formalize a new primitive called Multi-Device Verifiable Key Directory (MVKD), which strengthens both the security, privacy, and usability guarantees of VKD by leveraging the multi-device setting. We formalize three properties for a MVKD (completeness, extraction-based soundness, and privacy), striking a non-trivial balance between strong guarantees and the limitations imposed by a truly practical system. We then present a new MVKD system called ELEKTRA. This system combines the core of the Keybase KT system (running in production since 2014) with ideas from SEEMless (Chase et. al., 2019) and RZKS (Chen et. al., 2022). Our construction is the first to achieve the above multi-device guarantees while having formal security and privacy proofs. Finally, we implement ELEKTRA and present benchmarks demonstrating its practicality.
Last updated:  2024-01-24
A Trust-based Recommender System over Arbitrarily Partitioned Data with Privacy
Ibrahim Yakut and Huseyin Polat
Recommender systems are effective mechanisms for recommendations about what to watch, read, or taste based on user ratings about experienced products or services. To achieve higher quality recommendations, e-commerce parties may prefer to collaborate over partitioned data. Due to privacy issues, they might hesitate to work in pairs and some solutions motivate them to collaborate. This study examines how to estimate trust-based predictions on arbitrarily partitioned data in which two parties have ratings for similar sets of customers and items. A privacy- preserving scheme is proposed, and it is justified that it efficiently offers trust-based predictions on partitioned data while preserving privacy.
Last updated:  2024-01-24
Differential cryptanalysis with SAT, SMT, MILP, and CP: a detailed comparison for bit-oriented primitives
Emanuele Bellini, Alessandro De Piccoli, Mattia Formenti, David Gerault, Paul Huynh, Simone Pelizzola, Sergio Polese, and Andrea Visconti
SAT, SMT, MILP, and CP, have become prominent in the differential cryptanalysis of cryptographic primitives. In this paper, we review the techniques for constructing differential characteristic search models in these four formalisms. Additionally, we perform a systematic comparison encompassing over 20 cryptographic primitives and 16 solvers, on both easy and hard instances of optimisation, enumeration and differential probability estimation problems.
Last updated:  2024-01-23
AnonPSI: An Anonymity Assessment Framework for PSI
Bo Jiang, Jian Du, and Qiang Yan
Private Set Intersection (PSI) is a widely used protocol that enables two parties to securely compute a function over the intersected part of their shared datasets and has been a significant research focus over the years. However, recent studies have highlighted its vulnerability to Set Membership Inference Attacks (SMIA), where an adversary might deduce an individual's membership by invoking multiple PSI protocols. This presents a considerable risk, even in the most stringent versions of PSI, which only return the cardinality of the intersection. This paper explores the evaluation of anonymity within the PSI context. Initially, we highlight the reasons why existing works fall short in measuring privacy leakage, and subsequently propose two attack strategies that address these deficiencies. Furthermore, we provide theoretical guarantees on the performance of our proposed methods. In addition to these, we illustrate how the integration of auxiliary information, such as the sum of payloads associated with members of the intersection (PSI-SUM), can enhance attack efficiency. We conducted a comprehensive performance evaluation of various attack strategies proposed utilizing two real datasets. Our findings indicate that the methods we propose markedly enhance attack efficiency when contrasted with previous research endeavors. The effective attacking implies that depending solely on existing PSI protocols may not provide an adequate level of privacy assurance. It is recommended to combine privacy-enhancing technologies synergistically to enhance privacy protection even further.
Last updated:  2024-02-06
ChaCha related 64 bit oriented ARX cipher
Daniel Nager
A cipher scheme related to ChaCha [Ber] with the variation of using 64 bit operations instead of 32 bits, and the same 512 bit state size, is presented. We will provide strong argumentation to assert that the same security of ChaCha can be obtained with less number of instructions for 24 rounds, instead of Chacha's 20 rounds. Also, an strategy to implement this cipher on SIMD extensions is presented, with a maximal throughput of about 4 bytes per cycle on a 256 bit SIMD extension with at least 11 vector registers.
Last updated:  2024-01-23
Laconic Branching Programs from the Diffie-Hellman Assumption
Sanjam Garg, Mohammad Hajiabadi, Peihan Miao, and Alice Murphy
Laconic cryptography enables secure two-party computation (2PC) on unbalanced inputs with asymptotically-optimal communication in just two rounds of communication. In particular, the receiver (who sends the first-round message) holds a long input and the sender (who sends the second-round message) holds a short input, and the size of their communication to securely compute a function on their joint inputs only grows with the size of the sender's input and is independent of the receiver's input size. The work on laconic oblivious transfer (OT) [Cho et al. CRYPTO 2017] and laconic private set intersection (PSI) [Alamati et al. TCC 2021] shows how to achieve secure laconic computation for OT and PSI from the Diffie-Hellman assumption. In this work, we push the limits further and achieve laconic branching programs from the Diffie-Hellman assumption. In particular, the receiver holds a large branching program $P$ and the sender holds a short input $x$. We present a two-round 2PC protocol that allows the receiver to learn $x$ iff $P(x) =1$, and nothing else. The communication only grows with the size of $x$ and the depth of $P$, and does not further depend on the size of $P$.
Last updated:  2024-01-29
Unconditional Security using (Random) Anonymous Bulletin Board
Albert Yu, Hai H. Nguyen, Aniket Kate, and Hemanta K. Maji
In a seminal work, Ishai et al. (FOCS–2006) studied the viability of designing unconditionally secure protocols for key agreement and secure multi-party computation (MPC) using an anonymous bulletin board (ABB) as a building block. While their results establish the feasibility of key agreement and honest-majority MPC in the ABB model, the optimality of protocols with respect to their round and communication complexity is not studied. This paper enriches this study of unconditional security in the ABB model in multiple ways. - We present a key agreement protocol with a novel combinatorial insight to offer a 200% throughput over the (FOCS–2006) study; i.e., using the same number of messages, we can (almost) double the bit-length of the agreed key. We also prove the near optimality of our approach. - We offer unconditionally secure protocols for the (random) string oblivious transfer functionalities. We present a $1$-round chosen message random string oblivious transfer and show how to extend it to a non-interactive (random) string oblivious transfer protocol and a $2$-round chosen message string oblivious transfer. - We prove a $1$-round lower bound for BEC under certain conditions. Central to our technical contributions is the abstraction of a distributional variant of the random ABB functionality. Investigating the concrete efficiency of founding MPC from this primitive leads to fascinating new mathematical challenges in well-established MPC models, which will be of broader interest to the community.
Last updated:  2024-04-30
FiveEyes: Cryptographic Biometric Authentication from the Iris
Luke Demarest, Sohaib Ahmad, Sixia Chen, Benjamin Fuller, and Alexander Russell
Despite decades of effort, a stubborn chasm exists between the theory and practice of device-level biometric authentication. Deployed authentication algorithms rely on data that overtly leaks private information about the biometric; thus systems rely on externalized security measures such as trusted execution environments. The authentication algorithms have no cryptographic guarantees. This is particularly frustrating given the long line of research that has developed theoretical tools—known as fuzzy extractors—that enable secure, privacy preserving biometric authentication with public enrollment data (Dodis et al., SIAM Journal of Computing 2008). Unfortunately, the best known constructions either: 1. Assume that bits of biometrics are i.i.d. (or that all correlation is captured in pairs of features (Hine et al., TIFS 2023)), which is not true for the biometrics themselves or for features extracted using modern learning techniques, or 2. Only provide substantial true accept rates with an estimated security of $32$ bits for the iris (Simhadri et al., ISC 2019) and $45$ bits for the face (Zhang, Cui, and Yu, ePrint 2021/1559). This work introduces FiveEyes, an iris key derivation system powered by technical advances in both 1) feature extraction from the iris and 2) the fuzzy extractor used to secure authentication keys. FiveEyes’ feature extractor’s loss focuses on quality for key derivation. The fuzzy extractor builds on sample-then-lock (Canetti et al., Journal of Cryptology 2021). FiveEyes’ fuzzy extractor uses statistics of the produced features to sample non-uniformly, which significantly improves the security vs. true accept rate (TAR) tradeoff. Irises used to evaluate TAR and security are class disjoint from those used for training and collecting statistics. We state assumptions sufficient for security. We present various parameter regimes to highlight different TARs: 1. $65$ bits of security (equivalent to $87$ bits with a password) at $12$% TAR, and 2. $50$ bits of security (equivalent to $72$ bits with a password) at $45$% TAR. Applying known TAR (Davida et al., IEEE S&P 1998) amplification techniques additively boosts TAR by $30$% for the above settings.
Last updated:  2024-01-22
Snarktor: A Decentralized Protocol for Scaling SNARKs Verification in Blockchains
Alberto Garoffolo, Dmytro Kaidalov, and Roman Oliynykov
The use of zero-knowledge Succinct Non-Interactive Arguments of Knowledge (zk-SNARK) and similar types of proofs has become increasingly popular as a solution for improving scalability, privacy, and interoperability of blockchain systems. However, even with the most advanced proving systems, verifying a single SNARK proof can require a significant amount of computational resources making it expensive to be performed on-chain. This becomes a noticeable bottleneck in scaling SNARK-based applications. Further efficiency improvement to avoid this bottleneck lies in utilizing distributed recursive proof composition to aggregate multiple existing proofs into one that verifies all underlying proofs. Building upon this concept, we present a new protocol for decentralized recursive proof aggregation allowing one unique proof to aggregate many input proofs to be efficiently verified on-chain, increasing the throughput and cost efficiency of SNARK-based blockchains. The protocol is designed for decentralized environments where independent actors (provers) can join and contribute to the proof generation process. We also present an incentive scheme for such actors. The protocol is abstract enough to be used with a variety of proving systems that support recursive aggregation.
Last updated:  2024-01-22
Theoretical differential fault attacks on FLIP and FiLIP
Pierrick Méaux and Dibyendu Roy
In this article, we examine Differential Fault Attacks (DFA) targeting two stream ciphers, FLIP and FiLIP. We explore the fault model where an adversary flips a single bit of the key at an unknown position. Our analysis involves establishing complexity bounds for these attacks, contingent upon the cryptographic parameters of the Boolean functions employed as filters and the key size. Initially, we demonstrate how the concept of sensitivity enables the detection of the fault position using only a few keystream bits. This represents an enhancement over previous DFA methodologies applied to these ciphers. Subsequently, we leverage the properties of the filter's derivatives to execute attacks. This approach is universally applicable to any filter, and we delineate specific attack strategies for the two function families previously implemented in these ciphers.
Last updated:  2024-01-22
Improved All-but-One Vector Commitment with Applications to Post-Quantum Signatures
Dung Bui, Kelong Cong, and Cyprien Delpech de Saint Guilhem
Post-quantum digital signature schemes have recently received increased attention due to the NIST standardization project for additional signatures. MPC-in-the-Head and VOLE-in-the-Head are general techniques for constructing such signatures from zero-knowledge proof systems. A common theme between the two is an all-but-one vector commitment scheme which internally uses GGM trees. This primitive is responsible for a significant part of the computational time during signing and verification. A more efficient technique for constructing GGM trees is the half-tree technique, introduced by Guo et al. (Eurocrypt 2023). Our work builds an all-but-one vector commitment scheme from the half-tree technique, and further generalizes it to an all-but-\(\tau\) vector commitment scheme. Crucially, our work avoids the use of the random oracle assumption in an important step, which means our binding proof is non-trivial and instead relies on the random permutation oracle. Since this oracle can be instantiated using fixed-key AES which has hardware support, we achieve faster signing and verification times. We integrate our vector commitment scheme into FAEST (faest.info), a round one candidate in the NIST standardization process, and demonstrates its performance with a prototype implementation. For \(\lambda = 128\), our experimental results show a nearly \(3.5\)-fold improvement in signing and verification times.
Last updated:  2024-01-22
Revisiting the security analysis of SNOVA
Yasuhiko Ikematsu and Rika Akiyama
SNOVA is a multivariate signature scheme submitted to the ad- ditional NIST PQC standardization project started in 2022. SNOVA is con- structed by incorporating the structure of the matrix ring over a finite field into the UOV signature scheme, and the core part of its public key is the UOV public key whose coefficients consist of matrices. As a result, SNOVA dramatically reduces the public key size compared to UOV. In this paper, we recall the construction of SNOVA, and reconsider its security analysis. In particular, we investigate key recovery attacks applied to the core part of the public key of SNOVA in detail. Due to our analysis, we show that some pa- rameters of SNOVA submitted in the additional NIST PQC standardization do not satisfy the claimed security levels.
Last updated:  2024-01-22
ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches
Tian Zhou, Fangyu Zheng, Guang Fan, Lipeng Wan, Wenxu Tang, Yixuan Song, Yi Bian, and Jingqiang Lin
The remarkable performance capabilities of AI accelerators offer promising opportunities for accelerating cryptographic algorithms, particularly in the context of lattice-based cryptography. However, current approaches to leveraging AI accelerators often remain at a rudimentary level of implementation, overlooking the intricate internal mechanisms of these devices. Consequently, a significant number of computational resources is underutilized. In this paper, we present a comprehensive exploration of NVIDIA Tensor Cores and introduce a novel framework tailored specifically for Kyber. Firstly, we propose two innovative approaches that efficiently break down Kyber's NTT into iterative matrix multiplications, resulting in approximately a 75% reduction in costs compared to the state-of-the-art scanning-based methods.Secondly, by reversing the internal mechanisms, we precisely manipulate the internal resources of Tensor Cores using assembly-level code instead of inefficient standard interfaces, eliminating memory accesses and redundant function calls. Finally, building upon our highly optimized NTT, we provide a complete implementation for all parameter sets of Kyber. Our implementation surpasses the state-of-the-art Tensor Core based work, achieving remarkable speed-ups of 1.93x, 1.65x, 1.22x and 3.55x for polyvec_ntt, KeyGen, Enc and Dec in Kyber-1024, respectively. Even when considering execution latency, our throughput-oriented full Kyber implementation maintains an acceptable execution latency. For instance, the execution latency ranges from 1.02 to 5.68 milliseconds for Kyber-1024 on R3080 when achieving the peak throughput.
Last updated:  2024-01-21
Chosen-Ciphertext Secure Dual-Receiver Encryption in the Standard Model Based on Post-Quantum Assumptions
Laurin Benz, Wasilij Beskorovajnov, Sarai Eilebrecht, Roland Gröll, Maximilian Müller, and Jörn Müller-Quade
Dual-receiver encryption (DRE) is a special form of public key encryption (PKE) that allows a sender to encrypt a message for two recipients. Without further properties, the difference between DRE and PKE is only syntactical. One such important property is soundness, which requires that no ciphertext can be constructed such that the recipients decrypt to different plaintexts. Many applications rely on this property in order to realize more complex protocols or primitives. In addition, many of these applications explicitly avoid the usage of the random oracle, which poses an additional requirement on a DRE construction. We show that all of the IND-CCA2 secure standard model DRE constructions based on post-quantum assumptions fall short of augmenting the constructions with soundness and describe attacks thereon. We then give an overview over all applications of IND-CCA2 secure DRE, group them into generic (i. e., applications using DRE as black-box) and non-generic applications and demonstrate that all generic ones require either soundness or public verifiability. Conclusively, we identify the gap of sound and IND-CCA2 secure DRE constructions based on post-quantum assumptions in the standard Model. In order to fill this gap we provide two IND-CCA2 secure DRE constructions based on the standard post-quantum assumptions, Normal Form Learning With Errors (NLWE) and Learning Parity with Noise (LPN).
Last updated:  2024-01-21
Short Code-based One-out-of-Many Proofs and Applications
Xindong Liu and Li-Ping Wang
In this work, we propose two novel succinct one-out-of-many proofs from coding theory, which can be seen as extensions of the Stern's framework and Veron's framework from proving knowledge of a preimage to proving knowledge of a preimage for one element in a set, respectively. The size of each proof is short and scales better with the size of the public set than the code-based accumulator in \cite{nguyen2019new}. Based on our new constructions, we further present a logarithmic-size ring signature scheme and a logarithmic-size group signature scheme. Our schemes feature a short signature size, especially our group signature. To our best knowledge, it is the most compact code-based group signature scheme so far. At 128-bit security level, our group signature size is about 144 KB for a group with $2^{20}$ members while the group signature size of the previously most compact code-based group signature constructed by the above accumulator exceeds 3200 KB.
Last updated:  2024-02-08
Call Me By My Name: Simple, Practical Private Information Retrieval for Keyword Queries
Sofía Celi and Alex Davidson
We introduce $\mathsf{ChalametPIR}$: a single-server Private Information Retrieval (PIR) scheme supporting fast, low-bandwidth keyword queries, with a conceptually very simple design. In particular, we develop a generic framework for converting PIR schemes for index queries over flat arrays (based on the Learning With Errors problem) into keyword PIR. This involves representing a key-value map using any probabilistic filter that permits reconstruction of elements from inclusion queries (e.g. Cuckoo filters). In particular, we make use of recently developed Binary Fuse filters to construct $\mathsf{ChalametPIR}$, with minimal efficiency blow-up compared with state-of-the-art index-based schemes (all costs bounded by a factor of $\leq 1.08$). Furthermore, we show that $\mathsf{ChalametPIR}$ achieves runtimes and financial costs that are factors of between $6$-$11\times$ and $3.75$-$11.4\times$ more efficient, respectively, than state-of-the-art keyword PIR approaches, for varying database configurations. Bandwidth costs are additionally reduced or remain competitive, depending on the configuration. Finally, we believe that our application of Binary Fuse filters can have independent value towards developing efficient variants of related cryptographic primitives (e.g. private set intersection), that already benefit from using less efficient filter constructions.
Last updated:  2024-01-20
On historical Multivariate Cryptosystems and their restorations as instruments of Post-Quantum Cryptography
Vasyl Ustimenko
The paper presents a short survey of the History of Multivariate Cryptography together with the usage of old broken multivariate digital signatures in the new protocol based cryptosystems constructed in terms of Noncommutative Cryptography. The general schemes of New cryptosystems is a combinations of Eulerian maps and quadratic maps with their trapdoor accelerators, which are pieces of information such than the knowledge of them allow to compute the reimages in a polynomial time. These schemes are illustrated by historical examples of Imai – Matsumoto multivariate digital signatures schemes and Unbalanced Oil and Vinegar Cryptosystems.
Last updated:  2024-01-22
Starlit: Privacy-Preserving Federated Learning to Enhance Financial Fraud Detection
Aydin Abadi, Bradley Doyle, Francesco Gini, Kieron Guinamard, Sasi Kumar Murakonda, Jack Liddell, Paul Mellor, Steven J. Murdoch, Mohammad Naseri, Hector Page, George Theodorakopoulos, and Suzanne Weller
Federated Learning (FL) is a data-minimization approach enabling collaborative model training across diverse clients with local data, avoiding direct data exchange. However, state-of-the-art FL solutions to identify fraudulent financial transactions exhibit a subset of the following limitations. They (1) lack a formal security definition and proof, (2) assume prior freezing of suspicious customers’ accounts by financial institutions (limiting the solutions’ adoption), (3) scale poorly, involving either $O(n^2)$ computationally expensive modular exponentiation (where $n$ is the total number of financial institutions) or highly inefficient fully homomorphic encryption, (4) assume the parties have already completed the identity alignment phase, hence excluding it from the implementation, performance evaluation, and security analysis, and (5) struggle to resist clients’ dropouts. This work introduces Starlit, a novel scalable privacy-preserving FL mechanism that overcomes these limitations. It has various applications, such as enhancing financial fraud detection, mitigating terrorism, and enhancing digital health. We implemented Starlit and conducted a thorough performance analysis using synthetic data from a key player in global financial transactions. The evaluation indicates Starlit’s scalability, efficiency, and accuracy.
Last updated:  2024-01-19
Two-party GOST in two parts: fruitless search and fruitful synthesis
Liliya Akhmetzyanova, Evgeny Alekseev, Alexandra Babueva, Lidiia Nikiforova, and Stanislav Smyshlyaev
In the current paper we investigate the possibility of designing secure two-party signature scheme with the same verification algorithm as in the Russian standardized scheme (GOST scheme). We solve this problem in two parts. The first part is a (fruitless) search for an appropriate scheme in the literature. It turned out that all existing schemes are insecure in the strong security models. The second part is a synthesis of new signature scheme and ends fruitfully. We synthesize a new two-party GOST signature scheme, additionally using the commitment scheme, guided by the features of the GOST signature scheme, as well as the known attacks on existing schemes. We prove that this scheme is secure in a bijective random oracle model in the case when one of the parties is malicious under the assumption that the classical GOST scheme is unforgeable in a bijective random oracle model and the commitment scheme is modelled as a random oracle.
Last updated:  2024-01-19
Enabling PERK on Resource-Constrained Devices
Slim Bettaieb, Loïc Bidoux, Alessandro Budroni, Marco Palumbi, and Lucas Pandolfo Perin
PERK is a digital signature scheme submitted to the recent NIST Post-Quantum Cryptography Standardization Process for Additional Digital Signature Schemes. For NIST security level I, PERK features sizes ranging from 6kB to 8.5kB, encompassing both the signature and public key, depending on the parameter set. Given its inherent characteristics, PERK's signing and verification algorithms involve the computation of numerous large objects, resulting in substantial stack-memory consumption ranging from 300kB to 1.5MB for NIST security level I and from 1.1MB to 5.7MB for NIST security level V. In this paper, we present a memory-versus-performance trade-off strategy that significantly reduces PERK's memory consumption to a maximum of approximately 82kB for any security level, enabling PERK to be executed on resource-constrained devices. Additionally, we explore various optimizations tailored to the Cortex M4 and introduce the first implementation of PERK designed for this platform.
Last updated:  2024-01-23
Tree-based Lookup Table on Batched Encrypted Queries using Homomorphic Encryption
Jung Hee Cheon, Hyeongmin Choe, and Jai Hyun Park
Homomorphic encryption (HE) is in the spotlight as a solution for privacy-related issues in various real-world scenarios. However, the limited types of operations supported by each HE scheme have been a major drawback in applications. While HE schemes based on learning-with-error (LWE) problem provide efficient lookup table (LUT) evaluation in terms of latency, they have downsides in arithmetic operations and low throughput compared to HE schemes based on ring LWE (RLWE) problem. The use of HE on circuits containing LUT has been partly limited if they contain arithmetic operations or their computational width is large. In this paper, we propose homomorphic algorithms for batched queries on LUTs by using RLWE-based HE schemes. To look up encrypted LUTs of size $n$ on encrypted queries, our algorithms use $O(\log{n})$ homomorphic comparisons and $O(n)$ multiplications. For unencrypted LUTs, our algorithms use $O(\log{n})$ comparisons, $O(\sqrt{n})$ ciphertext multiplications, and $O(n)$ scalar multiplications. We provide a proof-of-concept implementation based on CKKS scheme (Asiacrypt 2017). The amortized running time for an encrypted (Resp. unencrypted) LUT of size $512$ is $0.041$ (Resp. $0.025$) seconds. Our implementation reported roughly $2.4$-$6.0$x higher throughput than the current implementation of LWE-based schemes, with more flexibility on the structure of the LUTs.
Last updated:  2024-03-03
On Hilbert-Poincaré series of affine semi-regular polynomial sequences and related Gröbner bases
Momonari Kudo and Kazuhiro Yokoyama
Gröbner bases are nowadays central tools for solving various problems in commutative algebra and algebraic geometry. A typical use of Gröbner bases is is the multivariate polynomial system solving, which enables us to construct algebraic attacks against post-quantum cryptographic protocols. Therefore, the determination of the complexity of computing Gröbner bases is very important both in theory and in practice: One of the most important cases is the case where input polynomials compose an (overdetermined) affine semi-regular sequence. The first part of this paper aims to present a survey on Gröbner basis computation and its complexity. In the second part, we shall give an explicit formula on the (truncated) Hilbert-Poincaré series associated to the homogenization of an affine semi-regular sequence. Based on the formula, we also study (reduced) Gröbner bases of the ideals generated by an affine semi-regular sequence and its homogenization. Some of our results are considered to give mathematically rigorous proofs of the correctness of methods for computing Gröbner bases of the ideal generated by an affine semi-regular sequence.
Last updated:  2024-01-29
Simultaneously simple universal and indifferentiable hashing to elliptic curves
Dmitrii Koshelev
The present article explains how to generalize the hash function SwiftEC (in an elementary quasi-unified way) to any elliptic curve $E$ over any finite field $\mathbb{F}_{\!q}$ of characteristic $> 3$. The new result apparently brings the theory of hash functions onto elliptic curves to its logical conclusion. To be more precise, this article provides compact formulas that define a hash function $\{0,1\}^* \to E(\mathbb{F}_{\!q})$ (deterministic and indifferentible from a random oracle) with the same working principle as SwiftEC. In particular, both of them equally compute only one square root in $\mathbb{F}_{\!q}$ (in addition to two cheap Legendre symbols). However, the new hash function is valid with much more liberal conditions than SwiftEC, namely when $3 \mid q-1$. Since in the opposite case $3 \mid q-2$ there are already indifferentiable constant-time hash functions to $E$ with the cost of one root in $\mathbb{F}_{\!q}$, this case is not processed in the article. If desired, its approach nonetheless allows to easily do that mutatis mutandis.
Last updated:  2024-02-16
Efficient Instances of Docked Double Decker With AES, and Application to Authenticated Encryption
Christoph Dobraunig, Krystian Matusiewicz, Bart Mennink, and Alexander Tereschenko
A tweakable wide blockcipher is a construction which behaves in the same way as a tweakable blockcipher, with the difference that the actual block size is flexible. Due to this feature, a tweakable wide blockcipher can be directly used as a strong encryption scheme that provides full diffusion when encrypting plaintexts to ciphertexts and vice versa. Furthermore, it can be the basis of authenticated encryption schemes fulfilling the strongest security notions. In this paper, we present two instantiations of the docked double decker tweakable wide blockcipher: $\mathit{ddd}\text{-}\mathit{AES}$ and $\mathit{bbb}\text{-}\mathit{ddd}\text{-}\mathit{AES}$. Both instances exclusively use similar building blocks as AES-GCM (AES and finite field multiplication), are designed for maximal parallelism, and hence, can make efficient use of existing hardware accelerators. Moreover, $\mathit{bbb}\text{-}\mathit{ddd}\text{-}\mathit{AES}$ builds upon a novel beyond birthday bound secure pseudorandom function, a tweakable variant of the XOR of permutations, facilitating in the need to include a tweak in the AES evaluations without sacrificing flexibility in docked double decker. We furthermore introduce an authenticated encryption mode $\mathit{aaa}$ specifically tailored to be instantiated with $\mathit{ddd}\text{-}\mathit{AES}$ and $\mathit{bbb}\text{-}\mathit{ddd}\text{-}\mathit{AES}$, where special attentions is given to how the nonce and associated data can be processed. We prove that this mode is secure both in the nonce-respecting setting as well as in the setting where random nonces are used.
Last updated:  2024-01-18
Layout Graphs, Random Walks and the t-wise Independence of SPN Block Ciphers
Tianren Liu, Angelos Pelecanos, Stefano Tessaro, and Vinod Vaikuntanathan
We continue the study of $t$-wise independence of substitution-permutation networks (SPNs) initiated by the recent work of Liu, Tessaro, and Vaikuntanathan (CRYPTO 2021). Our key technical result shows that when the S-boxes are randomly and independently chosen and kept secret, an $r$-round SPN with input length $n = b \cdot k$ is $2^{-\Theta(n)}$-close to $t$-wise independent within $r = O(\min\{k, \log t\})$ rounds for any $t$ almost as large as $2^{b/2}$. Here, $b$ is the input length of the S-box and we assume that the underlying mixing achieves maximum branch number. We also analyze the special case of AES parameters (with random S-boxes), and show it is $2^{-128}$-close to pairwise independent in $7$ rounds. Central to our result is the analysis of a random walk on what we call the layout graph, a combinatorial abstraction that captures equality and inequality constraints among multiple SPN evaluations. We use our technical result to show concrete security bounds for SPNs with actual block cipher parameters and small-input $S$-boxes. (This is in contrast to the large body of results on ideal-model analyses of SPNs.) For example, for the censored-AES block cipher, namely AES with most of the mixing layers removed, we show that 192 rounds suffice to attain $2^{-128}$-closeness to pairwise independence. The prior such result for AES (Liu, Tessaro and Vaikuntanathan, CRYPTO 2021) required more than 9000 rounds.
Last updated:  2024-01-18
Quantum State Obfuscation from Classical Oracles
James Bartusek, Zvika Brakerski, and Vinod Vaikuntanathan
A major unresolved question in quantum cryptography is whether it is possible to obfuscate arbitrary quantum computation. Indeed, there is much yet to understand about the feasibility of quantum obfuscation even in the classical oracle model, where one is given for free the ability to obfuscate any classical circuit. In this work, we develop a new array of techniques that we use to construct a quantum state obfuscator, a powerful notion formalized recently by Coladangelo and Gunn (arXiv:2311.07794) in their pursuit of better software copy-protection schemes. Quantum state obfuscation refers to the task of compiling a quantum program, consisting of a quantum circuit $C$ with a classical description and an auxiliary quantum state $\ket{\psi}$, into a functionally-equivalent obfuscated quantum program that hides as much as possible about $C$ and $\ket{\psi}$. We prove the security of our obfuscator when applied to any pseudo-deterministic quantum program, i.e. one that computes a (nearly) deterministic classical input / classical output functionality. Our security proof is with respect to an efficient classical oracle, which may be heuristically instantiated using quantum-secure indistinguishability obfuscation for classical circuits. Our result improves upon the recent work of Bartusek, Kitagawa, Nishimaki and Yamakawa (STOC 2023) who also showed how to obfuscate pseudo-deterministic quantum circuits in the classical oracle model, but only ones with a completely classical description. Furthermore, our result answers a question of Coladangelo and Gunn, who provide a construction of quantum state indistinguishability obfuscation with respect to a quantum oracle, but leave the existence of a concrete real-world candidate as an open problem. Indeed, our quantum state obfuscator together with Coladangelo-Gunn gives the first candidate realization of a ``best-possible'' copy-protection scheme for all polynomial-time functionalities. Our techniques deviate significantly from previous works on quantum obfuscation. We develop several novel technical tools which we expect to be broadly useful in quantum cryptography. These tools include a publicly-verifiable, linearly-homomorphic quantum authentication scheme with classically-decodable ZX measurements (which we build from coset states), and a method for compiling any quantum circuit into a "linear + measurement" ($\LM$) quantum program: an alternating sequence of CNOT operations and partial ZX measurements.
Last updated:  2024-01-18
SuperFL: Privacy-Preserving Federated Learning with Efficiency and Robustness
Yulin Zhao, Hualin Zhou, and Zhiguo Wan
Federated Learning (FL) accomplishes collaborative model training without the need to share local training data. However, existing FL aggregation approaches suffer from inefficiency, privacy vulnerabilities, and neglect of poisoning attacks, severely impacting the overall performance and reliability of model training. In order to address these challenges, we propose SuperFL, an efficient two-server aggregation scheme that is both privacy preserving and secure against poisoning attacks. The two semi-honest servers $\mathcal{S}_0$ and $\mathcal{S}_1$ collaborate with each other, with a shuffle server $\mathcal{S}_0$ in charge of privacy-preserving random clustering, while an analysis server $\mathcal{S}_1$ responsible for robustness detection, identifying and filtering malicious model updates. Our scheme employs a novel combination of homomorphic encryption and proxy re-encryption to realize secure server-to-server collaboration. We also utilize a novel sparse matrix projection compression technique to enhance communication efficiency and significantly reduce communication overhead. To resist poisoning attacks, we introduce a dual-filter algorithm based on trusted root, combine dimensionality reduction and norm calculation to identify malicious model updates. Extensive experiments validate the efficiency and robustness of our scheme. SuperFL achieves impressive compression ratios, ranging from $5\text{-}40$x, under different models while maintaining comparable model accuracy as the baseline. Notably, our solution demonstrates a maximal model accuracy decrease of no more than $2\%$ and $6\%$ on the MNIST and CIFAR-10 datasets respectively, under specific compression ratios and the presence of malicious clients.
Last updated:  2024-04-25
Memory adds no cost to lattice sieving for computers in 3 or more spatial dimensions
Samuel Jaques
The security of lattice-based crytography (LWE, NTRU, and FHE) depends on the hardness of the shortest-vector problem (SVP). Sieving algorithms give the lowest asymptotic runtime to solve SVP, but depend on exponential memory. Memory access costs much more in reality than in the RAM model, so we consider a computational model where processors, memory, and meters of wire are in constant proportions to each other. While this adds substantial costs to route data during lattice sieving, we modify existing algorithms to amortize these costs and find that, asymptotically, a classical computer can achieve the previous RAM model cost of $2^{0.2925d+o(d)}$ to sieve a $d$-dimensional lattice for a computer existing in 3 or more spatial dimensions, and can reach $2^{0.3113d+o(d)}$ in 2 spatial dimensions, where "spatial dimensions" are the dimensions of the physical geometry in which the computer exists. Under some assumptions about the constant terms of memory access, we estimate increases in bit security between $7$ to $23$ bits for different Kyber parameter sets and $8$ to $22$ bits for Dilithium.
Last updated:  2024-01-22
On Modular Algorithms and Butterfly Operations in Number Theoretic Transform
Yanze Yang, Yiran Jia, and Guangwu Xu
Number theoretic transform (NTT) has been a very useful tool in computations for number theory, algebra and cryptography. Its performance affects some post-quantum cryptosystems. In this paper, we discuss the butterfly operation of NTT. This basic module of NTT requires heavy modular arithmetics. Montgomery reduction is commonly used in this setting. Recently several variants of Montgomery have been proposed for the purpose of speeding up NTT. We observe that the Chinese remainder theorem (CRT) can be involved in this type of algorithms in nature and transparent ways. In this paper, a framework of using CRT to model Montgomery type algorithms is described. The derivation of these algorithms as well as their correctness are all treated in the CRT framework. Under our approach, some problems of a modular reduction algorithm ((published in IACR Transactions on Cryptographic Hardware and Embedded Systems, doi:10.46586/tches.v2022.i4.614-636 ) are identified, and a counterexample is generated to show that the algorithm is incorrect.
Last updated:  2024-01-17
Formal Security Analysis of the OpenID FAPI 2.0: Accompanying a Standardization Process
Pedram Hosseyni, Ralf Kuesters, and Tim Würtele
In recent years, the number of third-party services that can access highly-sensitive data has increased steadily, e.g., in the financial sector, in eGovernment applications, or in high-assurance identity services. Protocols that enable this access must provide strong security guarantees. A prominent and widely employed protocol for this purpose is the OpenID Foundation's FAPI protocol. The FAPI protocol is already in widespread use, e.g., as part of the UK's Open Banking standards and Brazil's Open Banking Initiative as well as outside of the financial sector, for instance, as part of the Australian government's Consumer Data Rights standards. Based on lessons learned from FAPI 1.0, the OpenID Foundation has developed a completely new protocol, called FAPI 2.0. The specifications of FAPI 2.0 include a concrete set of security goals and attacker models under which the protocol aims to be secure. Following an invitation from the OpenID Foundation's FAPI Working Group (FAPI WG), we have accompanied the standardization process of the FAPI 2.0 protocol by an in-depth formal security analysis. In this paper, we report on our analysis and findings. Our analysis incorporates the first formal model of the FAPI 2.0 protocol and is based on a detailed model of the web infrastructure, the Web Infrastructure Model, originally proposed by Fett, Küsters, and Schmitz. Our analysis has uncovered several types of attacks on the protocol, violating the aforementioned security goals set by the FAPI WG. We subsequently have worked with the FAPI WG to fix the protocol, resulting in several changes to the specifications. After adapting our model to the changed specifications, we have proved the security properties to hold under the strong attacker model defined by the FAPI WG.
Last updated:  2024-01-17
OBSCURE: Versatile Software Obfuscation from a Lightweight Secure Element
Darius Mercadier, Viet Sang Nguyen, Matthieu Rivain, and Aleksei Udovenko
Software obfuscation is a powerful tool to protect the intellectual property or secret keys inside programs. Strong software obfuscation is crucial in the context of untrusted execution environments (e.g., subject to malware infection) or to face potentially malicious users trying to reverse-engineer a sensitive program. Unfortunately, the state-of-the-art of pure software-based obfuscation (including white-box cryptography) is either insecure or infeasible in practice. This work introduces OBSCURE, a versatile framework for practical and cryptographically strong software obfuscation relying on a simple stateless secure element (to be embedded, for example, in a protected hardware chip or a token). Based on the foundational result by Goyal et al. from TCC 2010, our scheme enjoys provable security guarantees, and further focuses on practical aspects, such as efficient execution of the obfuscated programs, while maintaining simplicity of the secure element. In particular, we propose a new rectangular universalization technique, which is also of independent interest. We provide an implementation of OBSCURE taking as input a program source code written in a subset of the C programming language. This ensures usability and a broad range of applications of our framework. We benchmark the obfuscation on simple software programs as well as on cryptographic primitives, hence highlighting the possible use cases of the framework as an alternative to pure software-based white-box implementations.
Last updated:  2024-04-24
A provably masked implementation of BIKE Key Encapsulation Mechanism
Loïc Demange and Mélissa Rossi
BIKE is a post-quantum key encapsulation mechanism (KEM) selected for the 4th round of the NIST’s standardization campaign. It relies on the hardness of the syndrome decoding problem for quasi-cyclic codes and on the indistinguishability of the public key from a random element, and provides the most competitive performance among round 4 candidates, which makes it relevant for future real-world use cases. Analyzing its side-channel resistance has been highly encouraged by the community and several works have already outlined various side-channel weaknesses and proposed ad-hoc countermeasures. However, in contrast to the well-documented research line on masking lattice-based algorithms, the possibility of generically protecting code-based algorithms by masking has only been marginally studied in a 2016 paper by Cong Chen et al. At this stage of the standardization campaign, it is important to assess the possibility of fully masking BIKE scheme and the resulting cost in terms of performances. In this work, we provide the first high-order masked implementation of a code-based algorithm. We had to tackle many issues such as finding proper ways to handle large sparse polynomials, masking the key-generation algorithm or keeping the benefit of the bitslicing. In this paper, we present all the gadgets necessary to provide a fully masked implementation of BIKE, we discuss our different implementation choices and we propose a full proof of masking in the Ishai Sahai and Wagner (Crypto 2003) model. More practically, we also provide an open C-code masked implementation of the key-generation, encapsulation and decapsulation algorithms with extensive benchmarks. While the obtained performance is slower than existing masked lattice-based algorithms, the scaling in the masking order is still encouraging and no Boolean to Arithmetic conversion has been used. We hope that this work can be a starting point for future analysis and optimization.
Last updated:  2024-02-08
Succinct Verification of Compressed Sigma Protocols in the Updatable SRS setting
Moumita Dutta, Chaya Ganesh, and Neha Jawalkar
We propose protocols in the Compressed Sigma Protocol framework that achieve a succinct verifier. Towards this, we construct a new inner product argument and cast it in the Compressed Sigma Protocol (CSP) framework as a protocol for opening a committed linear form, achieving logarithmic verification. We then use our succinct-verifier CSP to construct a zero-knowledge argument for circuit satisfiability (under the discrete logarithm assumption in bilinear groups) in the updatable Structured Reference String (SRS) setting that achieves $O(\log n)$ proof size and $O(\log n)$ verification complexity. Our circuit zero-knowledge protocol has concretely better proof/prover/verifier complexity compared to the the state-of-the-art protocol in the updatable setting under the same assumption. Our techniques of achieving verifier-succinctness in the compression framework is of independent interest. We then show a commitment scheme for committing to group elements using a structured commitment key. We construct protocols to open a committed homomorphism on a committed vector with verifier succinctness in the designated verifier setting. This has applications in making the verifier in compressed sigma protocols for bilinear group arithmetic circuits, succinct.
Last updated:  2024-01-17
PRIDA: PRIvacy-preserving Data Aggregation with multiple data customers
Beyza Bozdemir, Betül Aşkın Özdemir, and Melek Önen
We propose a solution for user privacy-oriented privacy-preserving data aggregation with multiple data customers. Most existing state-of-the-art approaches present too much importance on performance efficiency and seem to ignore privacy properties except for input privacy. Most solutions for data aggregation do not generally discuss the users’ birthright, namely their privacy for their own data control and anonymity when they search for something on the browser or volunteer to participate in a survey. Still, they are ambitious to secure data customers’ rights (which should come later). They focus on resulting in an efficiency-oriented data aggregation enabling input privacy only. We aim to give importance to user privacy, and we have designed a solution for data aggregation in which we keep efficiency in balance. We show that PRIDA provides a good level of computational and communication complexities and is even better in timing evaluation than existing studies published recently (i.e., Bonawitz et al. (CCS’17), Corrigan-Gibbs et al. (NSDI’17), Bell et al. (CCS’20), Addanki et al. (SCN’22)). We employ threshold homomorphic encryption and secure two-party computation to ensure privacy properties. We balance the trade-off between a proper design for users and the desired privacy and efficiency.
Last updated:  2024-01-17
A Comparative Examination of Network and Contract-Based Blockchain Storage Solutions for Decentralized Applications
Lipeng He
Decentralized applications (DApps), which are innovative blockchain-powered software systems designed to serve as the fundamental building blocks for the next generation of Internet services, have witnessed exponential growth in recent years. This paper thoroughly compares and analyzes two blockchain-based decentralized storage networks (DSNs), which are crucial foundations for DApp and blockchain ecosystems. The study examines their respective mechanisms for data persistence, strategies for enforcing data retention, and token economics. In addition to delving into technical details, the suitability of each storage solution for decentralized application development is assessed, taking into consideration network performance, storage costs, and existing use cases. By evaluating these factors, the paper aims to provide insights into the effectiveness of these technologies in supporting the desirable properties of truly decentralized blockchain applications. In conclusion, the findings of this research are discussed and synthesized, offering valuable perspectives on the capabilities of these technologies. It sheds light on their potential to facilitate the development of DApps and provides an understanding of the ongoing trends in blockchain development.
Last updated:  2024-04-17
1/0 Shades of UC: Photonic Side-Channel Analysis of Universal Circuits
Dev M. Mehta, Mohammad Hashemi, Domenic Forte, Shahin Tajik, and Fatemeh Ganji
A universal circuit (UC) can be thought of as a programmable circuit that can simulate any circuit up to a certain size by specifying its secret configuration bits. UCs have been incorporated into various applications, such as private function evaluation (PFE). Recently, studies have attempted to formalize the concept of semiconductor intellectual property (IP) protection in the context of UCs. This is despite the observations made in theory and practice that, in reality, the adversary may obtain additional information about the secret when executing cryptographic protocols. This paper aims to answer the question of whether UCs leak information unintentionally, which can be leveraged by the adversary to disclose the configuration bits. In this regard, we propose the first photon emission analysis against UCs relying on computer vision-based approaches. We demonstrate that the adversary can utilize a cost-effective solution to take images to be processed by off-the-shelf algorithms to extract configuration bits. We examine the efficacy of our method in two scenarios: (1) the design is small enough to be captured in a single image during the attack phase, and (2) multiple images should be captured to launch the attack by deploying a divide-and-conquer strategy. To evaluate the effectiveness of our attack, we use metrics commonly applied in side-channel analysis, namely rank and success rate. By doing so, we show that our profiled photon emission analysis achieves a success rate of 1 by employing a few templates (concretely, only 18 images were used as templates).
Last updated:  2024-01-17
Too Hot To Be True: Temperature Calibration for Higher Confidence in NN-assisted Side-channel Analysis
Seyedmohammad Nouraniboosjin and Fatemeh Ganji
The past years have witnessed a considerable increase in research efforts put into neural network-assisted profiled side-channel analysis (SCA). Studies have also identified challenges, e.g., closing the gap between metrics for machine learning (ML) classification and side-channel attack evaluation. In fact, in the context of NN-assisted SCA, the NN’s output distribution forms the basis for successful key recovery. In this respect, related work has covered various aspects of integrating neural networks (NNs) into SCA, including applying a diverse set of NN models, model selection and training, hyperparameter tuning, etc. Nevertheless, one well-known fact has been overlooked in the SCA-related literature, namely NNs’ tendency to become “over-confident,” i.e., suffering from an overly high probability of correctness when predicting the correct class (secret key in the sense of SCA). Temperature scaling is among the powerful and effective techniques that have been devised as a remedy for this. Regarding the principles of deep learning, it is known that temperature scaling does not affect the NN’s accuracy; however, its impact on metrics for secret key recovery, mainly guessing entropy, is worth investigating. This paper reintroduces temperature scaling into SCA and demonstrates that key recovery can become more effective through that. Interestingly, temperature scaling can be easily integrated into SCA, and no re-tuning of the network is needed. In doing so, temperature can be seen as a metric to assess the NN’s performance before launching the attack. In this regard, the impact of hyperparameter tuning, network variance, and capacity have been studied. This leads to recommendations on how network miscalibration and overconfidence can be prevented.
Last updated:  2024-01-17
Hints from Hertz: Dynamic Frequency Scaling Side-Channel Analysis of Number Theoretic Transform in Lattice-Based KEMs
Tianrun Yu, Chi Cheng, Zilong Yang, Yingchen Wang, Yanbin Pan, and Jian Weng
Number Theoretic Transform (NTT) has been widely used in accelerating computations in lattice-based cryptography. However, attackers can potentially launch power analysis targeting NTT because it is usually the most time-consuming part of the implementation. This extended time frame provides a natural window of opportunity for attackers. In this paper, we investigate the first CPU frequency leakage (Hertzbleed-like) attacks against NTT in lattice-based KEMs. Our key observation is that different inputs to NTT incur different Hamming weights in its output and intermediate layers. By measuring the CPU frequency during the execution of NTT, we propose a simple yet effective attack idea to find the input to NTT that triggers NTT processing data with significantly low Hamming weight. We further apply our attack idea to real-world applications that are built upon NTT: CPA-secure Kyber without Compression and Decompression functions, and CCA-secure NTTRU. This leads us to extract information or frequency Hints about the secret key. Integrating these Hints into the LWE-estimator framework, we estimate a minimum of $35\%$ security loss caused by the leakage. The frequency and timing measurements on the Reference and AVX2 implementations of NTT in both Kyber and NTTRU align well with our theoretical analysis, confirming the existence of frequency side-channel leakage in NTT. It is important to emphasize that our observation is not limited to a specific implementation but rather the algorithm on which NTT is based. Therefore, our results call for more attention to the analysis of power leakage against NTT in lattice-based cryptography.
Last updated:  2024-01-16
SDitH in Hardware
Sanjay Deshpande, James Howe, Jakub Szefer, and Dongze Yue
This work presents the first hardware realisation of the Syndrome-Decoding-in-the-Head (SDitH) signature scheme, which is a candidate in the NIST PQC process for standardising post-quantum secure digital signature schemes. SDitH's hardness is based on conservative code-based assumptions, and it uses the Multi-Party-Computation-in-the-Head (MPCitH) construction. This is the first hardware design of a code-based signature scheme based on traditional decoding problems and only the second for MPCitH constructions, after Picnic. This work presents optimised designs to achieve the best area efficiency, which we evaluate using the Time-Area Product (TAP) metric. This work also proposes a novel hardware architecture by dividing the signature generation algorithm into two phases, namely offline and online phases for optimising the overall clock cycle count. The hardware designs for key generation, signature generation, and signature verification are parameterised for all SDitH parameters, including the NIST security levels, both syndrome decoding base fields (GF256 and GF251), and thus conforms to the SDitH specifications. The hardware design further supports secret share splitting, and the hypercube optimisation which can be applied in this and multiple other NIST PQC candidates. The results of this work result in a hardware design with a drastic reducing in clock cycles compared to the optimised AVX2 software implementation, in the range of 2-4x for most operations. Our key generation outperforms software drastically, giving a 11-17x reduction in runtime, despite the significantly faster clock speed. On Artix 7 FPGAs we can perform key generation in 55.1 Kcycles, signature generation in 6.7 Mcycles, and signature verification in 8.6 Mcycles for NIST L1 parameters, which increase for GF251, and for L3 and L5 parameters.
Last updated:  2024-01-16
Laconic Function Evaluation, Functional Encryption and Obfuscation for RAMs with Sublinear Computation
Fangqi Dong, Zihan Hao, Ethan Mook, and Daniel Wichs
Laconic function evaluation (LFE) is a "flipped" version of fully homomorphic encryption, where the server performing the computation gets the output. The server commits itself to a function $f$ by outputting a small digest. Clients can later efficiently encrypt inputs $x$ with respect to the digest in much less time than computing $f$, and ensure that the server only decrypts $f(x)$, but does not learn anything else about $x$. Prior works constructed LFE for circuits under LWE, and for Turing Machines (TMs) from indistinguishability obfuscation (iO). In this work we introduce LFE for Random-Access Machines (RAM-LFE). The server commits itself to a potentially huge database $y$ via a short digest. Clients can later efficiently encrypt inputs $x$ with respect to the digest and the server decrypts $f(x,y)$ for some specified RAM program $f$ (e.g., a universal RAM), without learning anything else about $x$. The main advantage of RAM-LFE is that the server's decryption run-time only scales with the RAM run-time $T$ of the computation $f(x,y)$, which can be sublinear in both $|x|$ and $|y|$. We consider a weakly efficient variant, where the client's run-time is also allowed to scale linearly with $T$, but not $|y|$, and a fully efficient variant, where the client's run-time must be sublinear in both $T$ and $|y|$. We construct the former from doubly efficient private information retrieval (DEPIR) and laconic OT (LOT), both of which are known from RingLWE, and the latter from an additional use of iO. We then show how to leverage fully efficient RAM-LFE to also get (many-key) functional encryption for RAMs (RAM-FE) where secret keys are associate with big databases $y$ and the decryption time is sublinear in $|y|$, as well as iO for RAMs where the obfuscated program contains a big database $y$ and the evaluation time is sublinear in $|y|$.
Last updated:  2024-03-09
A Refined Hardness Estimation of LWE in Two-step Mode
Wenwen Xia, Leizhang Wang, Geng Wang, Dawu Gu, and Baocang Wang
Recently, researchers have proposed many LWE estimators, such as lattice-estimator (Albrecht et al, Asiacrypt 2017) and leaky-LWE-Estimator (Dachman-Soled et al, Crypto 2020), while the latter has already been used in estimating the security level of Kyber and Dilithium using only BKZ. However, we prove in this paper that solving LWE by combining a lattice reduction step (by LLL or BKZ) and a target vector searching step (by enumeration or sieving), which we call a Two-step mode, is more efficient than using only BKZ. Moreover, we give a refined LWE estimator in Two-step mode by analyzing the relationship between the probability distribution of the target vector and the solving success rate in a Two-step mode LWE solving algorithm. While the latest Two-step estimator for LWE, which is the “primal-bdd” mode in lattice-estimator1, does not take into account some up-to-date results and lacks a thorough theoretical analysis. Under the same gate-count model, our estimation for NIST PQC standards drops by 2.1∼3.4 bits (2.2∼4.6 bits while considering more flexible blocksize and jump strategy) compared with leaky-LWE-Estimator. Furthermore, we also give a conservative estimation for LWE from the Two-step solving algorithm. Compared with the Core-SVP model, which is used in previous conservative estimations, our estimation relies on weaker assumptions and outputs higher evaluation results than the Core- SVP model. For NIST PQC standards, our conservative estimation is 4.17∼8.11 bits higher than the Core-SVP estimation. Hence our estimator can give a closer estimation for both upper bound and lower bound of LWE hardness.
Last updated:  2024-01-18
Exploiting the Central Reduction in Lattice-Based Cryptography
Tolun Tosun, Amir Moradi, and Erkay Savas
This paper presents a novel and efficient way of exploiting side-channel leakage of masked implementations of lattice-based cryptography (LBC). The presented attack specifically targets the central reduction technique, which is widely adapted in efficient implementations of LBC. We show that the central reduction leads to a vulnerability by creating a strong dependency between the power consumption and the sign of sensitive intermediate variables. We exploit this dependency by introducing a novel hypothetical power model, the range power model, which can be employed in higher-order multi-query side-channel analysis attacks. We particularly show that our approach is valid for the prime moduli employed by Kyber and Dilithium, the lattice-based post-quantum algorithms selected by NIST, while it generalizes to other primes used in LBC as well. We practically evaluate our introduced approach by performing second-order non-profiled attacks against a masked implementation of Kyber on an Arm Cortex-M4 micro-processor. In our experiments we revealed the full secret key of the aforementioned implementation with only 2100 electro-magnetic (EM) traces without profiling, achieving a more than 14 times reduction in the number of traces compared to classical attacks.
Last updated:  2024-01-16
Privacy-preserving Anti-Money Laundering using Secure Multi-Party Computation
Marie Beth van Egmond, Vincent Dunning, Stefan van den Berg, Thomas Rooijakkers, Alex Sangers, Ton Poppe, and Jan Veldsink
Money laundering is a serious financial crime where criminals aim to conceal the illegal source of their money via a series of transactions. Although banks have an obligation to monitor transactions, it is difficult to track these illicit money flows since they typically span over multiple banks, which cannot share this information due to privacy concerns. We present secure risk propagation, a novel efficient algorithm for money laundering detection across banks without violating privacy concerns. In this algorithm, each account is assigned a risk score, which is then propagated through the transaction network. In this article we present two results. Firstly, using data from a large Dutch bank, we show that it is possible to detect unusual activity using this model, with cash ratio as the risk score. With a recall of 20%, the precision improves from 15% to 40% by propagating the risk scores, reducing the number of false positives significantly. Secondly, we present a privacy-preserving solution for securely performing risk propagation over a joint, inter-bank transaction network. To achieve this, we use Secure Multi-Party Computation (MPC) techniques, which are particularly well-suited for the risk propagation algorithm due to its structural simplicity. We also show that the running time of this secure variant scales linearly in the amount of accounts and transactions. For 200, 000 transactions, two iterations of the secure algorithm between three virtual parties, run within three hours on a consumer-grade server.
Last updated:  2024-01-16
Extreme Algebraic Attacks
Pierrick Méaux and Qingju Wang
When designing filter functions in Linear Feedback Shift Registers (LFSR) based stream ciphers, algebraic criteria of Boolean functions such as the Algebraic Immunity (AI) become key characteristics because they guarantee the security of ciphers against the powerful algebraic attacks. In this article, we investigate a generalization of the algebraic attacks proposed by Courtois and Meier on filtered LFSR twenty years ago. We consider how the standard algebraic attack can be generalized beyond filtered LFSR to stream ciphers applying a Boolean filter function to an updated state. Depending on the updating process, we can use different sets of annihilators than the ones used in the standard algebraic attack; it leads to a generalization of the concept of algebraic immunity, and more efficient attacks. To illustrate these strategies, we focus on one of these generalizations and introduce a new notion called Extreme Algebraic Immunity (EAI). We perform a theoretic study of the EAI criterion and explore its relation to other algebraic criteria. We prove the upper bound of the EAI of an n-variable Boolean function and further show that the EAI can be lower bounded by the AI restricted to a subset, as defined by Carlet, Méaux and Rotella at FSE 2017. We also exhibit functions with EAI guaranteed to be lower than the AI, in particular we highlight a pathological case of functions with optimal algebraic immunity and EAI only n/4. As applications, we determine the EAI of filter functions of some existing stream ciphers and discuss how extreme algebraic attacks using EAI could apply to some ciphers. Our generalized algebraic attack does not give a better complexity than Courtois and Meier's result on the existing stream ciphers. However, we see this work as a study to avoid weaknesses in the construction of future stream cipher designs.
Last updated:  2024-01-16
A Study of Soft Analytical Side-Channel Attacks on Secure Hash Algorithms
Julien Maillard, Thomas Hiscock, Maxime Lecomte, and Christophe Clavier
Hashing algorithms are one-way functions that are used in cryptographic protocols as Pseudo Random Functions (PRF), to assess data integrity or to create a Hash-based Message Authentication Code (HMAC). In many cryptographic constructions, secret data is processed with hashing functions. In these cases, recovering the input given to the hashing algorithm allows retrieving secret data. In this paper, we investigate the application of Soft Analytical Side-Channel Attacks (SASCA), based on a Belief Propagation (BP) framework, to recover the input of two popular hash function families: SHA-2 and SHA-3. Thanks to a simulation framework, we develop a comprehensive study of the attacker's recovery capacity depending on the hash function variant. Then, we demonstrate that an attacker can leverage prior knowledge on the hashing function input to increase the effectiveness of the attacks. As an example, in the context of a bootloader doing a hash-based integrity check on a secret firmware, we show that simple statistics on assembly code injected in BP improves input recovery. Finally, we study the security implications of SASCA on cryptosystems performing multiple invocations of hashing functions with inputs derived from the same secret data. We show that such constructions can be exploited efficiently by an attacker. We support such statements with experiments on SHA-256 based HMAC and on SHAKE-256 based PRF in Kyber's encryption routine. We also show that increasing Kyber's security parameters implies weaker security against the proposed SASCA targeting the shared key.
Last updated:  2024-01-16
Double Difficulties, Defense in Depth A succinct authenticated key agreement protocol
Uncategorized
WenBin Hsieh
Show abstract
Uncategorized
In 2016, NIST announced an open competition with the goal of finding and standardizing a suitable quantum-resistant cryptographic algorithm, with the standard to be drafted in 2023. These algorithms aim to implement post-quantum secure key encapsulation mechanism (KEM) and digital signatures. However, the proposed algorithm does not consider authentication and is vulnerable to attacks such as man-in-the-middle. In this paper, we propose an authenticated key exchange algorithm to solve the above problems and improve its usability. The proposed algorithm combines learning with errors (LWE) and elliptic curve discrete logarithm problem to provide the required security goals. As forward security is a desirable property in a key exchange protocol, an ephemeral key pair is designed that a long-term secret compromise does not affect the security of past session keys. Moreover, the exchange steps required by the algorithm are very streamlined and can be completed with only two handshakes. We also use the random oracle model to prove the correctness and the security of proposed scheme. The performance analysis demonstrates the effectiveness of the proposed scheme. We believe that the novel approach introduced in this algorithm opens several doors for innovative applications of digital signatures in KEMs.
Last updated:  2024-01-16
Partial Key Exposure Attack on Common Prime RSA
Mengce Zheng
In this paper, we focus on the common prime RSA variant and introduces a novel investigation into the partial key exposure attack targeting it. We explore the vulnerability of this RSA variant, which employs two common primes $p$ and $q$ defined as $p=2ga+1$ and $q=2gb+1$ for a large prime $g$. Previous cryptanalysis of common prime RSA has primarily focused on the small private key attack. In our work, we delve deeper into the realm of partial key exposure attacks by categorizing them into three distinct cases. We are able to identify weak private keys that are susceptible to partial key exposure by using the lattice-based method for solving simultaneous modular univariate linear equations. To validate the effectiveness and soundness of our proposed attacks, we conduct experimental evaluations. Through these examinations, we demonstrate the validity and practicality of the proposed partial key exposure attacks on common prime RSA.
Last updated:  2024-01-15
The Insecurity of Masked Comparisons: SCAs on ML-KEM’s FO-Transform
Julius Hermelink, Kai-Chun Ning, and Emanuele Strieder
NIST has released the draft standard for ML-KEM, and ML-KEM is actively used in several widely-distributed applications. Thus, the wide-spread use of ML-KEM in the embedded worlds has to be expected in the near future. This makes security against side-channel attacks a pressing matter. Several side-channel attacks have previously been proposed, and one line of research have been attacks against the comparison step of the FO-transform. These attacks construct a decryption failure oracle using a side-channel. A recent work published at TCHES 2022 stresses the need for higher-order masked comparisons by presenting a horizontal attack and proposes a t-probing secure comparison operation. A subsequent work by D’Anvers, Van Beirendonck, and Verbauwhede improves upon the performance of several previous proposals. In this work, we show that the latter masked comparison suffers from weakness similar to those identified in the former. We first propose an approximate template attack that requires only a very low number of traces for profiling and has an exceptionally high noise tolerance. We show that the profiling phase is not necessary and can be replaced by a vertical analysis of the distribution of certain points of interest without knowledge of the targeted values. Finally, we explain how a horizontal attack may construct a decryption failure oracle from a single trace. We provide a leakage model of the targeted operations, which is based on the noisy Hamming weight model. Our evaluations are carried out on a physical device to stress the practicality of our attack. In addition, we simulate the attacks to determine the measurement noise levels that can be handled. We discuss the underlying causes for our attack, the difficulty of securing the Fujisaki-Okamoto transform in ML-KEM, and draw conclusion about the (in-)sufficiency of t-probing security in this context.
Last updated:  2024-01-15
CrISA-X: Unleashing Performance Excellence in Lightweight Symmetric Cryptography for Extendable and Deeply Embedded Processors
Oren Ganon and Itamar Levi
The selection of a Lightweight Cryptography (LWC) algorithm is crucial for resource limited applications. The National Institute of Standards and Technology (NIST) leads this process, which involves a thorough evaluation of the algorithms’ cryptanalytic strength. Furthermore, careful consideration is given to factors such as algorithm latency, code size, and hardware implementation area. These factors are critical in determining the overall performance of cryptographic solutions at edge devices. Introducing CrISA-X, a Cryptography Instruction Set Architecture extensions designed to improve cryptographic latency on extendable processors. CrISA-X, classified as Generic-Atomic, Block-Specific and Procedure-Specific, leverages RISC processor hardware and a base ISA to effectively execute LWC algorithms. Our study aims to evaluate the execution efficiency of new single-cycle instruction extensions and tightly coupled multicycle instructions on extendable modular RISC processors. CrISA-X provides enhanced speed of various algorithms simultaneously while optimizing ISA adaptability, a feat yet to be accomplished. The extension, diverse for several computation levels, is first specifically tailored for individual algorithms and sets of LWC algorithms, depending on performance, frequency, and area trade-offs. By diligently applying the Min-Max optimization technique, we have configured these extensions to achieve a delicate balance between performance, area code size, etc. Our study presents empirical evidence of the performance enhancement achieved on a real synthesis modular RISC processor. We offer a framework for creating optimized processor hardware and ISA extensions. The CrISA-X framework generally outperforms ISA extensions by delivering significant performance boosts between 3x to 17x while experiencing a relative area cost increase of +12% and +47% in LUTs, in respect to the instruction set category. Notably, as one important example, the utilization of the ASCON algorithm yields a 10x performance boost in contrast to the base ISA instruction implementation
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.