Papers updated in last 183 days (Page 15 of 1733 results)
GAPP: Generic Aggregation of Polynomial Protocols
We propose a generic framework called GAPP for aggregation of polynomial protocols. This allows proving instances of a polynomial protocol using a single aggregate proof that has size, and can be verified using operations. The satisfiability of several univariate polynomial identities over a domain is reduced to the satisfiability of a single bivariate polynomial identity over a related domain, where the bivariate polynomials interpolate a batch of univariate polynomials over the domain. We construct an information-theoretic protocol for proving the satisfiability of the bivariate polynomial identity, which is then compiled using any bivariate polynomial commitment scheme (PCS) to yield an argument of knowledge for the aggregation relation. GAPP can be applied to several popular SNARKs over bilinear groups that are modeled as polynomial protocols in a black-box way.
We present a new bivariate polynomial commitment scheme, bPCLB, with succinct verification that yields an efficient instantiation of GAPP. In addition, the prover only performs sublinear cryptographic operations in the evaluation proof. Towards constructing bPCLB, we show a new folding technique that we call Lagrangian folding. The bivariate PCS bPCLB and the Lagrangian folding scheme are of independent interest. We implement bPCLB and experimentally validate the practical efficiency of our GAPP instantiation. For the popular PLONK proof system, we achieve - faster proof generation than the naive baseline of generating separate PLONK proofs. Compared to all existing aggregation schemes that incur additional prover overheads on top of the baseline, we achieve significantly more efficient proving, while retaining succinct verification.
We demonstrate the versatility of our GAPP framework by outlining applications of practical interest: tuple lookups that significantly outperform existing lookup arguments in terms of prover overheads; and proofs for non-uniform computation with a la carte prover cost.
Oblivious Turing Machine
In the ever-evolving landscape of Information Tech- nologies, private decentralized computing on an honest yet curious server has emerged as a prominent paradigm. While numerous schemes exist to safeguard data during computation, the focus has primarily been on protecting the confidentiality of the data itself, often overlooking the potential information leakage arising from the function evaluated by the server. Recognizing this gap, this article aims to address the issue by presenting and implementing an innovative solution for ensuring the privacy of both the data and the program. We introduce a novel approach that combines the power of Fully Homomorphic Encryption with the concept of the Turing Machine model, resulting in the first fully secure practical, non-interactive oblivious Turing Machine. Our Oblivious Turing Machine construction is based on only three hypotheses, the hardness of the Ring Learning With Error problem, the ability to homomorphically evaluate non-linear functions and the capacity to blindly rotate elements of a data structure. Only based on those three assumptions, we propose an implementation of an Oblivious Turing Machine relying on the TFHE cryptosystem and present some implementation results.
High-Throughput Secure Multiparty Computation with an Honest Majority in Various Network Settings
In this work, we present novel protocols over rings for semi-honest secure three-party computation (3PC) and malicious four-party computation (4PC) with one corruption. While most existing works focus on improving total communication complexity, challenges such as network heterogeneity and computational complexity, which impact MPC performance in practice, remain underexplored.
Our protocols address these issues by tolerating multiple arbitrarily weak network links between parties without any substantial decrease in performance. Additionally, they significantly reduce computational complexity by requiring up to half the number of basic instructions per gate compared to related work. These improvements lead to up to twice the throughput of state-of-the-art protocols in homogeneous network settings and up to eight times higher throughput in real-world heterogeneous settings. These advantages come at no additional cost: Our protocols maintain the best-known total communication complexity per multiplication, requiring 3 elements for 3PC and 5 elements for 4PC.
We implemented our protocols alongside several state-of-the-art protocols (Replicated 3PC, ASTRA, Fantastic Four, Tetrad) in a novel open-source C++ framework optimized for high throughput. Five out of six implemented 3PC and 4PC protocols achieve more than one billion 32-bit multiplications or over 32 billion AND gates per second using our implementation in a 25 Gbit/s LAN environment. This represents the highest throughput achieved in 3PC and 4PC so far, outperforming existing frameworks like MP-SPDZ, ABY3, MPyC, and MOTION by two to three orders of magnitude.
Another L makes it better? Lagrange meets LLL and may improve BKZ pre-processing
Uncategorized
Uncategorized
We present a new variant of the LLL lattice reduction algorithm, inspired by Lagrange notion of pair-wise reduction, called L4. Similar to LLL, our algorithm is polynomial in the dimension of the input lattice, as well as in , where is an upper-bound on the norm of the longest vector of the input basis.
We experimentally compared the norm of the first basis vector obtained with LLL and L4 up to dimension 200. On average we obtain vectors that are up to shorter. We also used our algorithm as a pre-processing step for the BKZ lattice reduction algorithm with blocksize 24. In practice, up to dimension 140, this allows us to reduce the norm of the shortest basis vector on average by , while the runtime does not
significantly increases. In of our tests, the whole process was even faster.
Information Set Decoding for Ring-Linear Code
Information set decoding (ISD) algorithms currently offer the most powerful tool to solve the two archetypal problems of coding theory, namely the Codeword Finding Problem and the Syndrome Decoding Problem. Traditionally, ISD have primarily been studied for linear codes over finite fields, equipped with the Hamming metric.
However, recently, other possibilities have also been explored. These algorithms have been adapted to different ambient spaces and metrics, such as the rank metric or the Lee metric over .
In this paper, we show that it is possible to leverage the ring structure to construct more efficient decoding algorithms than those obtained by simply adapting ISD. In particular, we describe a framework that can be applied to any additive metric including Hamming and Lee, and that can be adapted to the case of the rank metric, providing algorithms to solve the two aforementioned problems, along with their average computational costs.
Willow: Secure Aggregation with One-Shot Clients
A common drawback of secure vector summation protocols in the single-server model is that they impose at least one synchronization point between all clients contributing to the aggregation. This results in clients waiting on each other to advance through the rounds of the protocol, leading to large latency (or failures due to too many dropouts) even if the protocol is computationally efficient. In this paper we propose protocols in the single-server model where clients contributing data to the aggregation (i) send a single message to the server and (ii) can join aggregation sessions dynamically whenever they have resources, i.e., without the need for synchronizing their reporting time with any other clients. Our approach is based on a committee of parties that aid in the computation by running a setup phase before data collection starts, and a verification/decryption phase once it ends. Unlike existing committee-based protocols such as Flamingo (S\&P 2023), the cost for committee members can be made sub-linear in the number of clients, and does not depend on the size of the input client vectors. Our experimental evaluation shows that our protocol, even while allowing dynamic client participation, is competitive with the state of the art protocols that do not have that feature in both computation and communication.
Commutative Cryptanalysis as a Generalization of Differential Cryptanalysis
Recently, Baudrin et al. analyzed a special case of Wagner's commutative diagram cryptanalysis, referred to as commutative cryptanalysis. For a family of permutations on a finite vector space , commutative cryptanalysis exploits the existence of affine permutations , such that holds with high probability, taken over inputs , for a significantly large set of weak keys . Several attacks against symmetric cryptographic primitives can be formulated within the framework of commutative cryptanalysis, most importantly differential attacks, as well as rotational and rotational-differential attacks. Besides, the notion of -differentials on S-boxes can be analyzed as a special case within this framework.
We discuss the relations between a general notion of commutative cryptanalysis, with and being arbitrary functions over a finite Abelian group, and differential cryptanalysis, both from the view of conducting an attack on a symmetric cryptographic primitive, as well as from the view of a theoretical study of cryptographic S-boxes.
Last updated: 2024-10-16
Secure Transformer Inference
Security of model parameters and user data is critical for Transformer-based services, such as ChatGPT.
While recent strides in secure two-party protocols have successfully addressed security concerns in serving Transformer models, their adoption is practically infeasible due to the prohibitive cryptographic overheads involved.
Drawing insights from our hands-on experience in developing two real-world Transformer-based services, we identify the inherent efficiency bottleneck in the two-party assumption.
To overcome this limitation, we propose a novel three-party threat model that consists of model developer, model server, and data owner.
Based on this framework, we design a semi-symmetric permutation-based protection scheme and present STIP, the first secure Transformer inference protocol without any inference accuracy loss.
We analyze STIP's resistance to brute force, known-plaintext, and social engineering attacks and prove the privacy leakage upper bound using distance correlation.
And we propose a method to integrate the trusted execution environment with STIP to make model parameters resistant to model extraction and fine-tuning attacks.
Experiments on six representative series of Transformer models, with up to 70 billion parameters, in real systems show that STIP has strong security and no loss of accuracy.
For auto-regressive token generation, STIP achieves 31.7 ms latency for LLaMA2-7b model, significantly reducing the 5-minute overhead of the state-of-the-art two-party protocols.
Perceived Information Revisited II: Information-Theoretical Analysis of Deep-Learning Based Side-Channel Attacks
Previous studies on deep-learning-based side-channel attacks (DL-SCAs) have shown that traditional performance evaluation metrics commonly used in DL, like accuracy and F1 score, are not effective in evaluating DL-SCA performance. Therefore, some previous studies have proposed new alternative metrics for evaluating the performance of DL-SCAs. Notably, perceived information (PI) and effective perceived information (EPI) are major metrics based on information theory. While it has been experimentally confirmed that these metrics can give the attack success rate (SR) for DL-SCAs, their theoretical validity remains unclear.
In this paper, we propose a new theoretically valid performance evaluation metric called latent perceived information (LPI), which serves as an alternative to the existing metrics. LPI is defined as the mutual information between the output of the feature extractor of a neural network (NN) model and the intermediate value, representing the potential attack performance of the trained model. First, we prove that LPI provides an upper bound on the SR of a DL-SCA by modeling and formulating DL-SCA as a communication channel. Additionally, we clarify the conditions under which PI and EPI theoretically provide an upper bound on the SR from the perspective of LPI. For practical computation of LPI, we present two methods. One utilizes the Kraskov (KSG) estimator, a common mutual information estimator, and the other is based on logistic regression. While the KSG estimator is computationally intensive, it yields accurate LPI values. In contrast, the logistic regression is faster but provides a lower bound for LPI. Through experimental attacks on AES software and hardware implementations with masking countermeasures, we demonstrate that the LPI values estimated by these two methods are significantly similar, indicating the reliability and soundness of our proposed estimation techniques. Furthermore, we show that, by using the logistic regression as a classifier, we can significantly improve the attack performance of the trained model when the difference between the SR upper bound by the LPI and its actual SR is large. This indicates that LPI represents the potential for performance improvement in the trained model. Therefore, our study contributes to optimizing the distinguisher for attack performance using the trained model.
Composing Timed Cryptographic Protocols: Foundations and Applications
Time-lock puzzles are unique cryptographic primitives that use computational complexity to keep information secret for some period of time, after which security expires. Unfortunately, twenty-five years after their introduction, current analysis techniques of time-lock primitives provide no sound mechanism to build multi-party cryptographic protocols which use expiring security as a building block. As pointed out recently in the peer-reviewed literature, current attempts at this problem lack either composability, a fully consistent analysis, or functionality.
This paper presents a new complexity theoretic based framework and new structural theorems to analyze timed primitives with full generality and in composition (which is the central modular protocol design tool). The framework includes a model of security based on fine-grained circuit complexity which we call ``residual complexity,'' which accounts for possible leakage on timed primitives as they expire. Our definitions for multi-party computation protocols generalize the literature standards by accounting for fine-grained polynomial circuit depth to model computational hardness which expires in feasible time. Our composition theorems, in turn, incur degradation of (fine-grained) security as items are composed. In our framework, simulators are given a polynomial ``budget" for computational depth, and in composition these polynomials interact.
Finally, we demonstrate via a prototypical auction application how to apply our framework and theorems. For the first time, we show that it is possible to prove -- in a way that is fully consistent, with falsifiable assumptions -- properties of multi-party applications based on leaky, temporarily secure components.
This work therefore significantly extends provable cryptography down from the self-contained world of arbitrary-polynomial security to the realm of small time domains which often appear in practice, where security of components expires while the larger system remains secure.
The Sting Framework: Proving the Existence of Superclass Adversaries
Uncategorized
Uncategorized
We introduce superclass accountability, a new notion of accountability for security protocols. Classical notions of accountability typically aim to identify specific adversarial players whose violation of adversarial assumptions has caused a security failure. Superclass accountability describes a different goal: to prove the existence of adversaries capable of violating security assumptions.
We develop a protocol design approach for realizing superclass accountability called the sting framework (SF). Unlike classical accountability, SF can be used for a broad range of applications without making protocol modifications and even when security failures aren’t attributable to particular players.
SF generates proofs of existence for superclass adversaries that are publicly verifiable, making SF a promising springboard for reporting by whistleblowers, high-trust bug-bounty programs, and so forth.
We describe how to use SF to prove the existence of adversaries capable of breaching the confidentiality of practical applications that include Tor, block-building infrastructure in web3, ad auctions, and private contact discovery---as well as the integrity of fair-transaction-ordering systems. We report on two end-to-end SF systems we have constructed---for Tor and block-building---and on experiments with those systems.
Testing Robustness of Homomorphically Encrypted Split Model LLMs
Large language models (LLMs) have recently transformed many industries, enhancing content generation, customer service agents, data analysis and even software generation. These applications are often hosted on remote servers to protect the neural-network model IP; however, this raises concerns about the privacy of input queries. Fully Homomorphic Encryption (FHE), an encryption technique that allows for computations on private data, has been proposed as a solution to the challenge. Nevertheless, due to the increased size of LLMs and the computational overheads of FHE, today's practical FHE LLMs are implemented using a split model approach. Here, a user sends their FHE encrypted data to the server to run an encrypted attention head layer; then the server returns the result of the layer for the user to run the rest of the model locally. By employing this method, the server maintains part of their model IP, and the user still gets to perform private LLM inference. In this work, we evaluate the neural-network model IP protections of single layer split model LLMs, and demonstrate a novel attack vector that makes it easy for a user to extract the neural network model IP from the server, bypassing the claimed protections for encrypted computation. In our analysis, we demonstrate the feasibility of this attack, and discuss potential mitigations.
Provable Security Analysis of Butterfly Key Mechanism Protocol in IEEE 1609.2.1 Standard
The paper provides the first provable security analysis of the Butterfly Key Mechanism (BKM) protocol from IEEE 1609.2.1 standard. The BKM protocol specifies a novel approach for efficiently requesting multiple certificates for use in vehicle-to-everything (V2X) communication. We define the main security goals of BKM, such as vehicle privacy and communication authenticity. We prove that the BKM protocol, with small modifications, meets those security goals. We also propose a way to significantly improve the protocol's efficiency without sacrificing security.
Proteus: A Fully Homomorphic Authenticated Transciphering Protocol
Fully Homomorphic Encryption (FHE) is a powerful technology that allows a cloud server to perform computations directly on ciphertexts. To overcome the overhead of sending and storing large FHE ciphertexts, the concept of FHE transciphering was introduced, allowing symmetric key encrypted ciphertexts to be transformed into FHE ciphertexts by deploying symmetric key decryption homomorphically. However, existing FHE transciphering schemes remain unauthenticated and malleable, allowing attackers to manipulate data and remain undetected. This work introduces Proteus, a new methodology for authenticated transciphering, which enables oblivious access control, preventing users from downloading unauthenticated or malicious data. Our protocol implementation adopts ASCON, NIST's new standard for lightweight cryptography, to enable homomorphic hashing and authenticated transciphering. Our ASCON transcipher is paired with the TFHE encryption scheme, which is well suited to perform encrypted rotation and bitwise operations. We evaluate our approach with a variety of real-life privacy-preserving applications, including URL phishing detection, private content moderation of hate speech, and biometric authentication.
TokenWeaver: Privacy Preserving and Post-Compromise Secure Attestation
Modern attestation based on Trusted Execution Environments (TEEs) can significantly reduce the risk of secret compromise, allowing users to securely perform sensitive computations such as running cryptographic protocols for authentication across security critical services. However, this has also made TEEs a high-value attack target, driving an arms race between novel compromise attacks and continuous TEEs updates.
Ideally, we want to achieve Post-Compromise Security (PCS): even after a TEE compromise, we can update it back into a secure state. However, at the same time, we would like to guarantee the privacy of users, in particular preventing providers (such as Intel, Google, or Samsung) or services from tracking users across services. This requires unlinkability, which seems incompatible with standard PCS healing mechanisms.
In this work, we develop TokenWeaver, the first privacy-preserving post-compromise secure attestation method with automated formal proofs for its core properties. We base our construction on weaving together two types of token chains, one of which is linkable and the other is unlinkable. We provide the full formal models based on the Tamarin and DeepSec provers, including protocol, security properties, and proofs for reproducibility, as well as a proof-of-concept implementation in python that shows the simplicity and applicability of our solution.
Quasilinear Masking to Protect ML-KEM Against Both SCA and FIA
Uncategorized
Uncategorized
The recent technological advances in Post-Quantum Cryptography (PQC) raise the questions of robust implementations of new asymmetric cryptography primitives in today's technology. This is the case for the lattice-based Module Lattice-Key Encapsulation Mechanism (ML-KEM) algorithm which is proposed by the National Institute of Standards and Technology (NIST) as the first standard for Key Encapsulation Mechanism (KEM), taking inspiration from CRYSTALS-Kyber. We must ensure that the ML-KEM implementation is resilient against physical attacks like Side-Channel Analysis (SCA) and Fault Injection Attacks (FIA). To reach this goal, we propose to adapt a masking countermeasure, more precisely the generic Direct Sum Masking method (DSM). We extend previous results from a paper using Reed-Solomon codes on AES for Code-Based Masking (CBM). This work present a complete masked implementation of ML-KEM with both SCA and FIA resilience thanks to the error correcting capabilities of Code-Based Masking. Due to the structure of this masking, we propose new generic solutions to address the non-linear parts of ML-KEM, with algorithmic optimizations. To do so, we develop a new conversion methodology between boolean and arithmetic Code-Based Maskings in the specific case of ML-KEM. Performances on a laptop as well as on a SAM4S microcontroller are detailed. Security is experimentally verified by performing a Test Vector Leakage Assessment (TVLA) on a SAM4S target thanks to a Chipwhisperer Husky. We also provide formal proofs of security in the SNI model.
Lattice-Based Vulnerabilities in Lee Metric Post-Quantum Cryptosystems
Post-quantum cryptography has gained attention due to the need for secure cryptographic systems in the face of quantum computing. Code-based and lattice-based cryptography are two promi- nent approaches, both heavily studied within the NIST standardization project. Code-based cryptography—most prominently exemplified by the McEliece cryptosystem—is based on the hardness of decoding random linear error-correcting codes. Despite the McEliece cryptosystem having been unbroken for several decades, it suffers from large key sizes, which has led to exploring variants using metrics than the Hamming metric, such as the Lee metric. This alternative metric may allow for smaller key sizes, but requires further analysis for potential vulnerabilities to lattice- based attack techniques. In this paper, we consider a generic Lee met- ric based McEliece type cryptosystem and evaluate its security against lattice-based attacks.
Attribute-Based Signatures for Circuits with Optimal Parameter Size from Standard Assumptions
Attribute-based signatures (ABS) allow users to simultaneously sign messages and prove their possession of some attributes while hiding the attributes and revealing only the fact that they satisfy a public policy. In this paper, we propose a generic construction of ABS for circuits of unbounded depth and size, with optimal parameter size—meaning the lengths of public parameters, keys, and signatures are all constant. Our construction can be instantiated from various standard assumptions, including LWE and DLIN. This substantially improves the state-of-the-art ABS scheme by Boyle, Goldwasser, and Ivan (PKC 2014), which, while achieving optimal parameter size, relies on succinct non-interactive arguments of knowledge that can only be constructed from non-standard assumptions. Our generic construction is based on RAM delegations. At a high level, we leverage the fact that the circuit associated with the signature can be made public and compress it using the power of RAM delegation. This allows us to achieve an overall optimal parameter size while simultaneously hiding the user’s policy.
Dora: A Simple Approach to Zero-Knowledge for RAM Programs
Existing protocols for proving the correct execution of a RAM program in zero-knowledge are plagued by a processor expressiveness tradeoff: supporting fewer instructions results in smaller processor circuits (which improves performance), but may result in more program execution steps because non-supported instruction must be emulated over multiple processor steps (diminishing performance).
We present Dora, a very simple and concretely efficient zero-knowledge protocol for RAM programs that sidesteps this tension by making it (nearly) free to add additional instructions to the processor. The computational and communication complexity of proving each step of a computation in Dora, is constant in the number of supported instructions. Dora’s approach is united by intuitive abstraction we call a ZKBag, a cryptographic primitive constructed from linearly homomorphic commitments that captures the properties of a physical bag. We implement Dora and demonstrate that on commodity hardware it can prove the correct execution of a processor with thousands of instruction, each of which has thousands of gates, in just a few milliseconds per step. Our evaluation shows that Dora has notably better end-to-end performance than concurrent work targeting the same problem.
SIMD-style Sorting of Integer Sequence in RLWE Ciphertext
This article discusses fully homomorphic encryption and homomorphic sorting. Homomorphic encryption is a special encryption technique that allows all kinds of operations to be performed on ciphertext, and the result is still decryptable, such that when decrypted, the result is the same as that obtained by performing the same operation on the plaintext. Homomorphic sorting is an important problem in homomorphic encryption. Currently, there has been a volume of work on homomorphic sorting. In these works, each integer in a sequence is encrypted in a separate ciphertext, there is a lack of research on sorting sequences of integers encrypted in a single ciphertext. This paper addresses the sorting problem by utilizing Single Instruction Multiple Data (SIMD) technology to provide new algorithms to improve computational efficiency. The content includes the following aspects.
For plaintexts encrypted word-wise, this paper studies sorting an integer sequence stored in one or multiple ciphertexts, and proposes a new SIMD-style homomorphic sorting algorithm. On theoretical complexity, compared with three existing sorting algorithms, namely, homomorphic sorting by polynomial computation over a finite field, by TFHE bootstrapping, or by Liu-Wang parallel bootstrapping, the new algorithm achieves a speedup of , , and , respectively, for sorting a plaintext integer sequence of length . By experimental results, the new algorithm is 1.7-9.2 times faster than the three sorting algorithms.
The third situation involves sorting multiple shorter sequences simultaneously, all of which can be stored in a single ciphertext. For this situation, this paper proposes a method for calculating the ord function, and uses this method to provide a new sorting algorithm. On theoretical complexity, if the total number of numbers to be sorted is and there are numbers in each sequence, the new algorithm is faster than three existing sorting algorithms, with speed-ups of , , and , respectively. By experimental results, the new algorithm is 2.1-6.4 times faster than existing sorting algorithms.
New Strategies for Bootstrapping Large-Error Ciphertext in Large-Precision FHEW/TFHE Cryptosystem
Bootstrapping is the core task in fully homomorphic encryption. It is designed to self-clean encrypted data to support unlimited level of homomorphic computing. FHEW/TFHE cryptosystem provides the fastest bootstrapping machinery in addition to the unique homomorphic evaluation functionality. In 2021, the problem of large-precision bootstrapping was investigated in the literature, with fast algorithms proposed and implemented. A common strategy to all the algorithms is to decompose the plaintext homomorphically into blocks from the tail up, at the same bootstrap the blocks sequentially.
This paper proposes two new strategies to improve the efficiency of large-precision plaintext bootstrapping. Both strategies are based on a new design of continuous nega-cyclic function with varying resolution, for making accurate computation with blockwise approximate computing. To minimize the approximation error in each block, optimizations are proposed based on rigorous error estimation, and are illustrated by error bounds in power-of-two binomial representation.
The first strategy is to make homomorphic approximate decomposition of the input plaintext from the head on. Compared with the tail-up approach, the head-on approach reduces the number of blocks at most by half asympotitically, at the same time reducing the final refreshed error by at most .
The second strategy extends the head-on approach from large-precision plaintext bootstrapping to large error reduction. It can be used directly to the input ciphertext for the purpose of plaintext bootstrapping; it can also be used after plaintext bootstrapping to further reduce the refreshed error.
Two algorithms based on the above two strategies are proposed, together with some variants combining the tail-up approach. The tail-up approach is completely re-developed for optimal blocksize control based on careful error analysis, and a corresponding algorithm is proposed. All the algorithms are implemented on PALISADE, and experiments based on real data show that the by the new strategies, the speed of large-precision plaintext bootstrapping can be improved to as many as 7 times.
Share Shrink: (In-)Feasibility of MPC from one Broadcast-then-Asynchrony, and Delegated Computation
We consider protocols for secure multi-party computation (MPC) under honest majority, i.e., for = players of which are corrupt, that achieve guaranteed output delivery (GOD), and operate in a single initial round of broadcast (BC), followed by steps of asynchronous peer-to-peer (P2P) messages. The power of closely related ``hybrid networks'' was studied in [Fitzi-Nielsen, Disc'09], [BHN, Podc'10] and [Patra-Ravi, IEEE Tr. Inf. Theory'18]. The interest of such protocols is that they go at the actual speed of the network, and security is preserved under arbitrary network conditions (past the initial BC).
We first complete the picture of this model with an impossibility result showing that some setup is required to achieve honest majority MPC with GOD. We then consider a bare bulletin-board PKI setup, and leverage recent advances in multi-key homomorphic encryption [BJMS, Asiacrypt'20], to state the feasibility of MPC in a tight 1-BC-then-1 single step of asynchronous P2P messages.
We then consider efficiency. The only protocols adaptable to such a network model and setup are [BJMS, Asiacrypt'20], which does not scale well for many players, and [GLS, Crypto'15], which does not support input delegation from external resource-constrained owners (such as IoT devices or smartphones), limiting its practical use.
Our main contribution is a generic design that enables MPC in 1BC-then-asynchronous P2P. It operates over ciphertexts encrypted under a (threshold) single-key encryption scheme, resulting in the smallest sizes expectable and efficient evaluation. It can be implemented from any homomorphic encryption scheme built from linear maps (e.g., GSW, CL, ...). Our main building block is the squishing of the verifiable input sharing (``Share''), in parallel with the distributed key generation (DKG) in the single BC, followed by threshold encryption (``Shrink'') in one asynchronous step.
Interestingly, it can be compiled as the first constant-round YOSO protocol in 1 BC.
Statistical Layered MPC
The seminal work of Rabin and Ben-Or (STOC'89) showed that the problem of secure -party computation can be solved for corruptions with guaranteed output delivery and statistical security. This holds in the traditional static model where the set of parties is fixed throughout the entire protocol execution.
The need to better capture the dynamics of large scale and long-lived computations, where compromised parties may recover and the set of parties can change over time, has sparked renewed interest in the proactive security model by Ostrovsky and Yung (PODC'91). This abstraction, where the adversary may periodically uncorrupt and corrupt a new set of parties, is taken even a step further in the more recent YOSO and Fluid MPC models (CRYPTO'21) which allow, in addition, disjoint sets of parties participating in each round. Previous solutions with guaranteed output delivery and statistical security only tolerate corruptions, or assume a random corruption pattern plus non-standard communication models.
Matching the Rabin and Ben-Or bound in these settings remains an open problem.
In this work, we settle this question considering the unifying Layered MPC abstraction recently introduced by David et al. (CRYPTO'23). In this model, the interaction pattern is defined by a layered acyclic graph, where each party sends secret messages and broadcast messages only to parties in the very next layer. We complete the feasibility landscape of layered MPC, by extending the Rabin and Ben-Or result to this setting. Our results imply maximally-proactive MPC with statistical security in the honest-majority setting.
The Role of Message-Bound Signatures for the Beyond UnForgeability Features and Weak Keys
In the present work, we establish a new relationship among the Beyond UnForgeability Features (BUFF) introduced by Cremers et al. (SP’21). There, the BUFF notions have been shown to be independent of one another. On the other hand, the analysis by Aulbach et al. (PQCrypto’24) reveals that one of the BUFF notions—message-bound signatures (MBS)—is achieved by most schemes. To achieve BUFF security, there is the generic BUFF transform that achieves all the beyond unforgeability features. The BUFF transform works by signing a hash of the public key and the message (rather than just the message), and appending this hash value to the signature. The need for appending the hash comes from the intuitive notion of weak keys that verify all message-signature pairs. We explain that MBS security effectively rules out the possibility of weak keys. This opens the possibility for a more efficient transform to achieve BUFF. We show that this transform, first introduced by Pornin and Stern (ACNS’05), indeed suffices to achieve BUFF security, if the original signature schemes satisfies MBS. Only in the malicious setting of exclusive ownership, we present an attack on UOV, even after applying the PS-3 transform.
Modelings for generic PoK and Applications: Shorter SD and PKP based Signatures
The Multi-Party Computation in the Head (MPCitH) paradigm has proven to be a versatile tool to design proofs of knowledge (PoK) based on variety of computationally hard problems. For instance, many post-quantum signatures have been designed from MPC based proofs combined with the Fiat-Shamir transformation. Over the years, MPCitH has evolved significantly with developments based on techniques such as threshold computing and other optimizations. Recently, Vector Oblivious Linear Evaluation (VOLE) and the VOLE in the Head framework has spurred further advances. In this work, we introduce three VOLE-friendly modelings for generic and communication efficient PoK to prove the knowledge of secret witness in the form of elementary vectors, vectors of Hamming weight at most , and permutation matrices. Remarkably, these modelings scale logarithmically with respect to the original witness sizes. Specifically, our modeling for elementary vectors of size transforms the witness size to , in case of vectors of size and Hamming weight at most the reduced witness is of size while our modeling for permutation matrix of size results in an (equivalent) witness of size , which leads to small proofs in practice. To achieve this, we consider modelings with higher multiplicative depth . Even if this increases the proof size, we show that our approach compares favorably with existing proofs. Such design choices were mostly overlooked in previous comparable works, maybe because prior to the VOLEitH framework, multiplications were often emulated with Beaver's triples causing the proof size to scale poorly with . We also provide several applications for our modelings namely i) post-quantum signature schemes based on the (Syndrome Decoding) problem and (Permuted Kernel Problem), ii) PoK of secrets key for code-based key encapsulation mechanism (KEM), and iii) ring signatures from and . Our signatures based on over and feature sizes of kB and kB for NIST-I security level which is respectively and shorter than state-of-the-art alternatives. Our PoK of secret key of BIKE and HQC are twice shorter than similar PoK for Kyber. In addition, we obtain the smallest ring signature based on and the first ring signature based on .
DMM: Distributed Matrix Mechanism for Differentially-Private Federated Learning using Packed Secret Sharing
Federated Learning (FL) has gained lots of traction recently, both in industry and academia. In FL, a machine learning model is trained using data from various end-users arranged in committees across several rounds. Since such data can often be sensitive, a primary challenge in FL is providing privacy while still retaining utility of the model. Differential Privacy (DP) has become the main measure of privacy in the FL setting. DP comes in two flavors: central and local. In the former, a centralized server is trusted to receive the users' raw gradients from a training step, and then perturb their aggregation with some noise before releasing the next version of the model. In the latter (more private) setting, noise is applied on users' local devices, and only the aggregation of users' noisy gradients is revealed even to the server. Great strides have been made in increasing the privacy-utility trade-off in the central DP setting, by utilizing the so-called \emph{matrix mechanism}. However, progress has been mostly stalled in the local DP setting. In this work, we introduce the \emph{distributed} matrix mechanism to achieve the best-of-both-worlds; local DP and also better privacy-utility trade-off from the matrix mechanism. We accomplish this by proposing a cryptographic protocol that securely transfers sensitive values across rounds, which makes use of \emph{packed secret sharing. This protocol accommodates the dynamic participation of users per training round required by FL, including those that may drop out from the computation. We provide experiments which show that our mechanism indeed significantly improves the privacy-utility trade-off of FL models compared to previous local DP mechanisms, with little added overhead.
Fully-Succinct Multi-Key Homomorphic Signatures from Standard Assumptions
Multi-Key Homomorphic Signatures (MKHS) allow one to evaluate a function on data signed by distinct users while producing a succinct and publicly-verifiable certificate of the correctness of the result. All the constructions of MKHS in the state of the art achieve a weak level of succinctness where signatures are succinct in the total number of inputs but grow linearly with the number of users involved in the computation. The only exception is a SNARK-based construction which relies on a strong notion of knowledge soundness in the presence of signing oracles that not only requires non-falsifiable assumptions but also encounters some impossibility results.
In this work, we present the first construction of MKHS that are fully succinct (also with respect to the number of users) while achieving adaptive security under standard falsifiable assumptions. Our result is achieved through a novel combination of batch arguments for NP (BARGs) and functional commitments (FCs), and yields diverse MKHS instantiations for circuits of unbounded depth based on either pairing or lattice assumptions. Additionally, our schemes support efficient verification with pre-processing, and they can easily be extended to achieve multi-hop evaluation and context-hiding.
MuxProofs: Succinct Arguments for Machine Computation from Vector Lookups
Proofs for machine computation prove the correct execution of arbitrary programs that operate over fixed instruction sets (e.g., RISC-V, EVM, Wasm). A standard approach for proving machine computation is to prove a universal set of constraints that encode the full instruction set at each step of the program execution. This approach incurs a proving cost per execution step on the order of the total sum of instruction constraints for all of the instructions in the set, despite each step of the program only executing a single instruction. Existing proving approaches that avoid this universal cost per step (and incur only the cost of a single instruction's constraints per step) either fail to provide zero-knowledge or rely on recursive proof composition for which security relies on the heuristic instantiation of the random oracle.
We present new protocols for proving machine execution that resolve these limitations, enabling prover efficiency on the order of only the executed instructions while achieving zero-knowledge and avoiding recursive proofs. Our core technical contribution is a new primitive that we call a succinct vector lookup argument which enables a prover to build up a machine execution ``on-the-fly''. We propose succinct vector lookups for both univariate polynomial and multivariate polynomial commitments in which vectors are encoded on cosets of a multiplicative subgroup and on subcubes of the boolean hypercube, respectively. We instantiate our proofs for machine computation by integrating our vector lookups with existing efficient, succinct non-interactive proof systems for NP.
Consensus on SNARK pre-processed circuit polynomials
This paper addresses verifiable consensus of pre-processed circuit polynomials for succinct non-interactive argument of knowledge (SNARK). More specifically, we focus on parts of circuits, referred to as wire maps, which may change based on program inputs or statements being argued. Preparing commitments to wire maps in advance is essential for certain SNARK protocols to maintain their succinctness, but it can be costly. SNARK verifiers can alternatively consider receiving wire maps from an untrusted parties.
We propose a consensus protocol that reaches consensus on wire maps using a majority rule. The protocol can operate on a distributed, irreversible, and transparent server, such as a blockchain. Our analysis shows that while the protocol requires over 50\% honest participants to remain robust against collusive attacks, it enables consensus on wire maps with a low and fixed verification complexity per communication, even in adversarial settings. The protocol guarantees consensus completion within a time frame ranging from a few hours to several days, depending on the wire map degree and the honest participant proportion.
Technically, our protocol leverages a directed acyclic graph (DAG) structure to represent conflicting wire maps among the untrusted deliverers. Wire maps are decomposed into low-degree polynomials, forming vertices and edges of this DAG. The consensus participants, or deliverers, collaboratively manage this DAG by submitting edges to branches they support. The protocol then returns a commitment to the wire map that is written in the first fully grown branch. The protocol's computational efficiency is derived from an interactive commit-prove-verify scheme that enables efficient validation of submitted edges.
Our analysis implies that the practical provides a practical solution for achieving secure consensus on SNARK wire maps in environments with dynamic proportion of honest participants. Additionally, we introduce a tunable parameter that allows the protocol to minimize cost and time to consensus while maintaining a desired level of security.
Blind-Folded: Simple Power Analysis Attacks using Data with a Single Trace and no Training
Side-Channel Attacks target the recovery of key material in cryptographic implementations by measuring physical quantities such as power consumption during the execution of a program. Simple Power Attacks consist in deducing secret information from a trace using a single or a few samples, as opposed to differential attacks which require many traces. Software cryptographic implementations now all contain a data-independent execution path, but often do not consider variations in power consumption associated to data. In this work, we show that a technique commonly used to select a value from different possible values in a control-independant way leads to significant power differences depending on the value selected. This difference is actually so important that a single sample can be considered for attacking one condition, and no training on other traces is required. We exploit this finding to propose the first single-trace attack without any knowledge gained on previous executions, using trace folding. We target the two modular exponentiation implementations in Libgcrypt, getting respectively 100% and 99.98% of correct bits in average on 30 executions using 2,048-bit exponents. We also use this technique to attack the scalar multiplication in ECDSA, successfully recovering all secret nonces on 1,000 executions. Finally, the insights we gained from this work allow us to show that a proposed counter-measure from the litterature for performing the safe loading of precomputed operands in the context of windowed implementations can be attacked as well.
Multi-Hop Multi-Key Homomorphic Signatures with Context Hiding from Standard Assumptions
Fully homomorphic signatures are a significant strengthening of digital signatures, enabling computations on \emph{secretly} signed data. Today, we have multiple approaches to design fully homomorphic signatures such as from lattices, or succinct functional commitments, or indistinguishability obfuscation, or mutable batch arguments. Unfortunately, all existing constructions for homomorphic signatures suffer from one or more limitations. We do not have homomorphic signatures with features such as multi-hop evaluation, context hiding, and fast amortized verification, while relying on standard falsifiable assumptions.
In this work, we design homomorphic signatures satisfying all above properties. We construct homomorphic signatures for polynomial-sized circuits from a variety of standard assumptions such as sub-exponential DDH, standard pairing-based assumptions, or learning with errors. We also discuss how our constructions can be easily extended to the multi-key setting.
Delegatable Anonymous Credentials From Mercurial Signatures With Stronger Privacy
Delegatable anonymous credentials (DACs) enable a root issuer to delegate credential-issuing power, allowing a delegatee to take a delegator role. To preserve privacy, credential recipients and verifiers should not learn anything about intermediate issuers in the delegation chain. One particularly efficient approach to constructing DACs is due to Crites and Lysyanskaya (CT-RSA '19). In contrast to previous approaches, it is based on mercurial signatures (a type of equivalence-class signature), offering a conceptually simple design that does not require extensive use of zero-knowledge proofs. Unfortunately, current constructions of ``CL-type'' DACs only offer a weak form of privacy-preserving delegation: if an adversarial issuer (even an honest-but-curious one) is part of a user's delegation chain, they can detect when the user shows its credential. This is because the underlying mercurial signature schemes allows a signer to identify his public key in a delegation chain.
We propose CL-type DACs that overcome the above limitation based on a new mercurial signature scheme that provides adversarial public key class hiding which ensures that adversarial signers who participate in a user's delegation chain cannot exploit that fact to trace users. We achieve this introducing structured public parameters for each delegation level. Since the related setup produces critical trapdoors, we discuss techniques from updatable structured reference strings in zero-knowledge proof systems (Groth et al. CRYPTO '18) to guarantee the required privacy needs. In addition, we propose a simple way to realize revocation for CL-type DACs via the concept of revocation tokens. While we showcase this approach to revocation using our DAC scheme, it is generic and can be applied to any CL-type DAC system. Revocation is a vital feature that is largely unexplored and notoriously hard to achieve for DACs, thus providing revocation can help to make DAC schemes more attractive in practical applications.
A Note on the Hint in the Dilithium Digital Signature Scheme
In the Dilithium digital signature scheme, there is an inherent tradeoff between the length of the public key, and the length of the signature. The coefficients of the main part of the public-key, the vector , are compressed (in a lossy manner), or "quantized", during the key-generation procedure, in order to save on the public-key size. That is, the coefficients are divided by some fixed denominator, and only the quotients are published. However, this results in some "skew" during the verification process, and to fix this, a special signature-dependent "hint" is computed during the signing process. Roughly speaking, stronger compression of results in the hint carrying more information, consequently increasing the signature length. Prior to the hint computation, a test is performed to check whether a proper hint can indeed be composed to fix this skew, and if the test fails, the signing process is rerun with a different seed for the (pseudo-)randomness. However, in this short report we observe that this test is not performed optimally: the test calculates a sufficient condition for the hint to work, but not a necessary one. We suggest a new refined test that results in a lower probability for the sign iteration to fail. The new test exhibits some improvement (in terms of expected running time) in certain configurations that are characterized by shorter public-key length on the expense of slightly longer signature length. It is noted that the change does not imply any change in the security of the algorithm.
Instance Compression, Revisited
Collision-resistant hashing (CRH) is a cornerstone of cryptographic protocols. However, despite decades of research, no construction of a CRH based solely on one-way functions has been found. Moreover, there are black-box limitations that separate these two primitives.
Harnik and Naor [HN10] overcame this black-box barrier by introducing the notion of instance compression. Instance compression reduces large NP instances to a size that depends on their witness size while preserving the "correctness" of the instance relative to the language. Shortly thereafter, Fortnow and Santhanam showed that efficient instance compression algorithms are unlikely to exist (as the polynomial hierarchy would collapse). Bronfman and Rothblum defined a computational analog of instance compression, which they called computational instance compression (CIC), and gave a construction of CIC under standard assumptions. Unfortunately, this notion is not strong enough to replace instance compression in Harnik and Naor's CRH construction.
In this work, we revisit the notion of computation instance compression and ask what the "correct" notion for CIC is, in the sense that it is sufficiently strong to achieve useful cryptographic primitives while remaining consistent with common assumptions. First, we give a natural strengthening of the CIC definition that serves as a direct substitute for the instance compression scheme in the Harnik--Naor construction. However, we show that even this notion is unlikely to exist.
We then identify a notion of CIC that gives new hope for constructing CRH from one-way functions via instance compression. We observe that this notion is achievable under standard assumptions and, by revisiting the Harnik--Naor proof, demonstrate that it is sufficiently strong to achieve CRH. In fact, we show that our CIC notion is existentially equivalent to CRH.
Beyond Minicrypt, Harnik and Naor showed that a strengthening of instance compression can be used to construct OT and public-key encryption. We rule out the computational analog of this stronger notion by showing that it contradicts the existence of incompressible public-key encryption, which was recently constructed under standard assumptions.
High-Throughput Three-Party DPFs with Applications to ORAM and Digital Currencies
Distributed point functions (DPF) are increasingly becoming a foundational tool with applications for application-specific and general secure computation.
While two-party DPF constructions are readily available for those applications with satisfiable performance, the three-party ones are left behind in both security and efficiency.
In this paper we close this gap and propose the first three-party DPF construction that matches the state-of-the-art two-party DPF on all metrics.
Namely, it is secure against a malicious adversary corrupting both the dealer and one out of the three evaluators, its function's shares are of the same size and evaluation takes the same time as in the best two-party DPF.
Compared to the state-of-the-art three-party DPF, our construction enjoys smaller function's share size and shorter evaluation time, for function domains of , respectively.
Apart from DPFs as a stand-alone tool, our construction finds immediate applications to private information retrieval (PIR), writing (PIW) and oblivious RAM (ORAM).
To further showcase its applicability, we design and implement an ORAM with access policy, an extension to ORAMs where a policy is being checked before accessing the underlying database.
The policy we plug-in is the one suitable for account-based digital currencies, and in particular to central bank digital currencies (CBDCs).
Our protocol offers the first design and implementation of a large scale privacy-preserving account-based digital currency. While previous works supported anonymity sets of 64-256 clients and less than 10 transactions per second (tps), our protocol supports anonymity sets in the millions, performing tps for anonymity sets of , respectively.
Toward that application, we introduce a new primitive called updatable DPF, which enables a direct computation of a dot product between a DPF and a vector; we believe that updatable DPF and the new dot-product protocol will find interest in other applications.
Securely Computing One-Sided Matching Markets
Top trading cycles (TTC) is a famous algorithm for trading indivisible goods between a set of agents such that all agents are as happy as possible about the outcome. In this paper, we present a protocol for executing TTC in a privacy preserving way. To the best of our knowledge, it is the first of its kind. As a technical contribution of independent interest, we suggest a new algorithm for determining all nodes in a functional graph that are on a cycle. The algorithm is particularly well suited for secure implementation in that it requires no branching and no random memory access. Finally, we report on a prototype implementation of the protocol based on somewhat homomorphic encryption.
Optimal Early Termination for Dishonest Majority Broadcast
Deterministic broadcast protocols among parties tolerating corruptions require rounds, where is the actual number of corruptions in an execution of the protocol. We provide the first protocol which is optimally resilient, adaptively secure, and asymptotically matches this lower bound for any . By contrast, the best known algorithm in this regime by Loss and Nielsen (EUROCRYPT'24) always requires rounds. Our main technical tool is a generalization of the notion of polarizer introduced by Loss and Nielsen, which allows parties to obtain transferable cryptographic evidence of missing messages with fewer rounds of interaction.
Secure Stateful Aggregation: A Practical Protocol with Applications in Differentially-Private Federated Learning
Recent advances in differentially private federated learning (DPFL) algorithms have found that using correlated noise across the rounds of federated learning (DP-FTRL) yields provably and empirically better accuracy than using independent noise (DP-SGD). While DP-SGD is well-suited to federated learning with a single untrusted central server using lightweight secure aggregation protocols, secure aggregation is not conducive to implementing modern DP-FTRL techniques without assuming a trusted central server. DP-FTRL based approaches have already seen widespread deployment in industry, albeit with a trusted central curator who provides and applies the correlated noise.
To realize a fully private, single untrusted server DP-FTRL federated learning protocol, we introduce secure stateful aggregation: a simple append-only data structure that allows for the private storage of aggregate values and reading linear functions of the aggregates. Assuming Ring Learning with Errors, we provide a lightweight and scalable realization of this protocol for high-dimensional data in a new security/resource model, Federated MPC: where a powerful persistent server interacts with weak, ephemeral clients. We observe that secure stateful aggregation suffices for realizing DP-FTRL-based private federated learning: improving DPFL utility guarantees over the state of the art while maintaining privacy with an untrusted central party. Our approach has minimal overhead relative to existing techniques which do not yield comparable utility. The secure stateful aggregation primitive and the federated MPC paradigm may be of interest for other practical applications.
MetaDORAM: Info-Theoretic Distributed ORAM with Less Communication
A Distributed Oblivious RAM is a multi-party protocol that securely implements a RAM functionality on secret-shared inputs and outputs. This paper presents two DORAMs in the semi-honest honest-majority 3-party setting which are information-theoretically secure and whose communication costs are asymptotic improvements over previous work. Let be the number of memory locations and let be the bit-length of each location.
The first, MetaDORAM1, is \emph{statistically} secure, with leakage. It has bits of communication per memory access. Here, is a free parameter and is any super-constant function (in ). The best prior work was that of Abraham et al (PKC 2017), which has cost . MetaDORAM1 is an asymptotic improvement over the work of Abraham et al whenever .
The second protocol, MetaDORAM2, achieves \emph{perfect} security, albeit at the cost of a computationally-expensive setup phase. It has communication cost . The best prior work of Chan et al (ASIACRYPT 2018) has communication cost . When , the communication cost of our protocol is , that is a factor improvement over that of Chan et al. Our work is the first perfectly secure DORAM with sub-logarithmic communication overhead. This comes at the cost of a once-off (for any given ) setup phase which requires exponential (in ) computation.
By a trivial transformation, these can be transformed, respectively, into statistically and perfectly secure active multi-server ORAM protocols with the same communication costs. These multi-server ORAM protocols are likewise asymptotic improvements over the state of the art.
Fiat-Shamir Goes Rational
This paper investigates the open problem of how to construct non-interactive rational proofs. Rational proofs, introduced by Azar and Micali (STOC 2012), are a model of interactive proofs where a computationally powerful server can be rewarded by a weaker client for running an expensive computation . The honest strategy is enforced by design when the server is rational: any adversary claiming a false output will lose money on expectation.
Rational proof constructions have appealing properties: they are simple, feature an extremely efficient verifier—reading only a sublinear number of bits of the input —and do not require any collateral from the prover. Currently, all non-trivial constructions of rational proofs are interactive. Developing non-interactive rational protocols would be a game-changer, making them practical for use in smart contracts, one of their most natural applications.
Our investigation revolves around the Fiat-Shamir transform, a common approach to compiling interactive proofs into their non-interactive counterparts. We are the first to tackle the question: "Can Fiat-Shamir be successfully applied to rational protocols?"
We find negative evidence by showing that, after applying Fiat-Shamir in the random oracle model to two representative protocols in literature (AM13 and CG15) these lose their security guarantees. Our findings point to more general impossibility theorems, which we leave as future work.
To achieve our results we first need to address a fundamental technical challenge: the standard Fiat-Shamir transform does not apply to protocols where the verifier has only oracle access to its input (a core feature of the rational setting). We propose two versions of Fiat-Shamir for this setting, a "vanilla" variant and a "stronger" variant (where the verifier has access to an honestly computed digest of its input). We show that neither variant is sufficient to ensure that AM13 or CG15 are secure in the non-interactive setting.
Finally, as an additional contribution, we provide a novel, and arguably simpler, definition for the soundness property of rational proofs (interactive or non-interactive) of independent interest.
Prover - Toward More Efficient Formal Verification of Masking in Probing Model
In recent years, formal verification has emerged as a crucial method for assessing security against Side-Channel attacks of masked implementations, owing to its remarkable versatility and high degree of automation. However, formal verification still faces technical bottlenecks in balancing accuracy and efficiency, thereby limiting its scalability. Former tools like maskVerif and CocoAlma are very efficient but they face accuracy issues when verifying schemes that utilize properties of Boolean functions. Later, SILVER addressed the accuracy issue, albeit at the cost of significantly reduced speed and scalability compared to maskVerif. Consequently, there is a pressing need to develop formal verification tools that are both efficient and accurate for designing secure schemes and evaluating implementations. This paper’s primary contribution lies in proposing several approaches to develop a more efficient and scalable formal verification tool called Prover, which is built upon SILVER. Firstly, inspired by the auxiliary data structures proposed by Eldib et al. and optimistic sampling rule of maskVerif, we introduce two reduction rules aimed at diminishing the size of observable sets and secret sets in statistical independence checks. These rules substantially decrease, or even eliminate, the need for repeated computation of probability distributions using Reduced Ordered Binary Decision Diagrams (ROBDDs), a time-intensive procedure in verification. Subsequently, we integrate one of these reduction rules into the uniformity check to mitigate its complexity. Secondly, we identify that variable ordering significantly impacts efficiency and optimize it for constructing ROBDDs, resulting in much smaller representations of investigated functions. Lastly, we present the algorithm of Prover, which efficiently verifies the security and uniformity of masked implementations in probing model with or without the presence of glitches. Experimental results demonstrate that our proposed tool Prover offers a better balance between efficiency and accuracy compared to other state-of-the-art tools (IronMask, CocoAlma, maskVerif, and SILVER). In our experiments, we also found an S-box that can only be verified by Prover, as IronMask cannot verify S-boxes, and both CocoAlma and maskVerif suffer from false positive issues. Additionally, SILVER runs out of time during verification.
AD-MPC: Fully Asynchronous Dynamic MPC with Guaranteed Output Delivery
Traditional secure multiparty computation (MPC) protocols presuppose a fixed set of participants throughout the computational process. To address this limitation, Fluid MPC [CRYPTO 2021] presents a dynamic MPC model that allows parties to join or exit during circuit evaluation dynamically. However, existing dynamic MPC protocols can guarantee safety but not liveness within asynchronous networks. This paper introduces ΠAD-MPC, a fully asynchronous dynamic MPC protocol. ΠAD-MPC ensures both safety and liveness with optimal resilience, capable of tolerating t (n=3t+1) corrupted participants. To achieve this, we develop a novel asynchronous transfer protocol ΠTrans and a preprocessing protocol ΠAprep specifically tailored for dynamic environments. In contrast to most dynamic MPC protocols that achieve security with abort in synchronous networks, ΠAD-MPC guarantees output delivery in asynchronous networks with optimal resilience, thus enhancing robustness. We provide a formal security proof of ΠAD-MPC under the Universal Composability (UC) framework. Furthermore, an extensive evaluation involving up to 20 geographically distributed nodes demonstrates the protocol’s practical performance and its ability to reliably deliver outputs in asynchronous dynamic settings. Compared to the state-of-the-art Fluid MPC, ΠAD-MPC achieves comparable performance while offering significantly enhanced security guarantees.
How to Construct Random Unitaries
The existence of pseudorandom unitaries (PRUs)---efficient quantum circuits that are computationally indistinguishable from Haar-random unitaries---has been a central open question, with significant implications for cryptography, complexity theory, and fundamental physics. In this work, we close this question by proving that PRUs exist, assuming that any quantum-secure one-way function exists. We establish this result for both (1) the standard notion of PRUs, which are secure against any efficient adversary that makes queries to the unitary , and (2) a stronger notion of PRUs, which are secure even against adversaries that can query both the unitary and its inverse . In the process, we prove that any algorithm that makes queries to a Haar-random unitary can be efficiently simulated on a quantum computer, up to inverse-exponential trace distance.
Flamingo: Multi-Round Single-Server Secure Aggregation with Applications to Private Federated Learning
This paper introduces Flamingo, a system for secure aggregation of data across a large set of clients. In secure aggregation, a server sums up the private inputs of clients and obtains the result without learning anything about the individual inputs beyond what is implied by the final sum. Flamingo focuses on the multi-round setting found in federated learning in which many consecutive summations (averages) of model weights are performed to derive a good model. Previous protocols, such as Bell et al. (CCS ’20), have been designed for a single round and are adapted to the federated learning setting by repeating the protocol multiple times. Flamingo eliminates the need for the per-round setup of previous protocols, and has a new lightweight dropout resilience protocol to ensure that if clients leave in the middle of a sum the server can still obtain a meaningful result. Furthermore, Flamingo introduces a new way to locally choose the so-called client neighborhood introduced by Bell et al. These techniques help Flamingo reduce the number of interactions between clients and the server, resulting in a significant reduction in the end-to-end runtime for a full training session over prior work.
We implement and evaluate Flamingo and show that it can securely train a neural network on the (Extended) MNIST and CIFAR-100 datasets, and the model converges without a loss in accuracy, compared to a non-private federated learning system.
Approximate PSI with Near-Linear Communication
Private Set Intersection (PSI) is a protocol where two parties with individually held confidential sets want to jointly learn (or secret-share) the intersection of these two sets and reveal nothing else to each other. In this paper, we introduce a natural extension of this notion to approximate matching. Specifically, given a distance metric between elements, an approximate PSI (Approx-PSI) allows to run PSI where ``close'' elements match. Assuming that elements are either ``close'' or sufficiently ``far apart'', we present an Approx-PSI protocol for Hamming distance that improves the overall efficiency compared to all existing approximate-PSI solutions. In particular, we achieve a near-linear communication complexity, an improvement over the previously best-known . We also show Approx-PSI protocols for Edit distance (also known as Levenstein distance), Euclidean distance and angular distance by deploying results on low distortion embeddings to Hamming distance. The latter two results further imply secure Approx-PSI for other metrics such as cosine similarity metric. Our Approx-PSI for Hamming distance is up to 20x faster and communicating 30% less than best known protocols when (1) matching small binary vectors; or (2) matching large threshold; or (3) matching large input sets. We also apply our technique to analyze private approximate membership computation, which can be viewed as asymmetric case of approximate PSI, and obtain a protocol with sublinear communication.
SQIPrime: A dimension 2 variant of SQISignHD with non-smooth challenge isogenies
We introduce SQIPrime, a post-quantum digital signature scheme based on the Deuring correspondence and Kani's Lemma.
Compared to its predecessors that are SQISign and especially SQISignHD, SQIPrime further expands the use of high dimensional isogenies, already in use in the verification in SQISignHD, to all its subroutines.
In doing so, it no longer relies on smooth degree isogenies (of dimension 1). Intriguingly, this includes the challenge isogeny which is also a non-smooth degree isogeny, but has an accessible kernel. The fact that the isogenies do not have rational kernel allows to fit more rational power 2 torsion points which are necessary when computing and representing the response isogeny.
SQIPrime operates with prime numbers of the form .
We describe two variants of SQIPrime. SQIPrime4D which incorporates the novelties described above and uses dimension 4 isogenies to represent the response isogeny. The runtime of higher dimensional isogeny computation is exponential in the dimension, hence the smaller the dimension the better for efficiency. The second variant, SQIPrime2D, solely uses dimension 2 isogenies. This is achieved by setting the degree of the secret isogeny to be equal to that of the challenge isogeny and further exploiting Kani's Lemma. SQIPrime2D is more efficient compared to SQIPrime4D and to SQISignHD, at the cost of being comparatively less compact, but still very compact compared to non isogeny based post-quantum signatures.
Optimized One-Dimensional SQIsign Verification on Intel and Cortex-M4
SQIsign is a well-known post-quantum signature scheme due to its small combined signature and public-key size. However, SQIsign suffers from notably long signing times, and verification times are not short either. To improve this, recent research has explored both one-dimensional and two-dimensional variants of SQIsign, each with distinct characteristics. In particular, SQIsign2D's efficient signing and verification times have made it a focal point of recent research. However, the absence of an optimized one-dimensional verification implementation hampers a thorough comparison between these different variants.
This work bridges this gap in the literature: we provide a state-of-the-art implementation of one-dimensional SQIsign verification, including novel optimizations. We report a record-breaking one-dimensional SQIsign verification time of 8.55 Mcycles on a Raptor Lake Intel processor, closely matching SQIsign2D on the same processor. For uncompressed signatures, the signature size doubles and we verify in only 5.6 Mcycles. Taking advantage of the inherent parallelism available in isogeny computations, we present 5-core variants that can go as low as 1.3 Mcycles. Furthermore, we present the first implementation that supports both 32-bit and 64-bit processors. It includes optimized assembly code for the Cortex-M4 and has been integrated with the pqm4 project. Our results motivate further research into one-dimensional SQIsign, as it boasts unique features among isogeny-based schemes.
Multiplying Polynomials without Powerful Multiplication Instructions (Long Paper)
We improve the performance of lattice-based cryptosystems Dilithium on Cortex-M3 with expensive multiplications. Our contribution is two-fold: (i) We generalize Barrett multiplication and show that the resulting shape-independent modular multiplication performs comparably to long multiplication on some platforms without special hardware when precomputation is free. We call a modular multiplication “shape-independent” if its correctness and efficiency depend only on the magnitude of moduli and not the shapes of the moduli. This was unknown in the literature even though modular multiplication has been studied for more than 40 years. In the literature, shape-independent modular multiplications often perform several times slower than long multiplications even if we ignore the cost of the precomputation. (ii) We show that polynomial multiplications based on Nussbaumer fast Fourier transform and Toom–Cook over perform the best when modular multiplications are expensive and is not very close to the arithmetic precision.
For practical evaluation, we implement assembly programs for the polynomial arithmetic used in the digital signature Dilithium on Cortex-M3. For the modular multiplications in Dilithium, our generalized Barrett multiplications are 1.92 times faster than the state-of-the-art assembly-optimized Montgomery multiplications, leading to 1.38−1.51 times faster Dilithium NTT/iNTT. Along with the improvement in accumulating products, the core polynomial arithmetic matrix-vector multiplications are 1.71−1.77 times faster. We further apply the FFT-based polynomial multiplications over to the challenge polynomial multiplication , leading to 1.31 times faster computation for .
We additionally apply the ideas to Saber on Cortex-M3 and demonstrate their improvement to Dilithium and Saber on our 8-bit AVR environment. For Saber on Cortex-M3, we show that matrix-vector multiplications with FFT-based polynomial multiplications over are 1.42−1.46 faster than the ones with NTT-based polynomial multiplications over NTT-friendly coefficient rings. When moving to a platform with smaller arithmetic precision, such as 8-bit AVR, we improve the matrix-vector multiplication of Dilithium with our Barrett-based NTT/iNTT by a factor of 1.87−1.89. As for Saber on our 8-bit AVR environment, we show that matrix-vector multiplications with NTT-based polynomial multiplications over NTT-friendly coefficient rings are faster than polynomial multiplications over due to the large in Saber.
MPC-in-the-Head Framework without Repetition and its Applications to the Lattice-based Cryptography
The MPC-in-the-Head framework has been pro-
posed as a solution for Non-Interactive Zero-Knowledge Arguments of Knowledge (NIZKAoK) due to its efficient proof generation. However, most existing NIZKAoK constructions using this approach require multiple MPC evaluations to achieve negligible soundness error, resulting in proof size and time that are asymptotically at least λ times the size of the circuit of the NP relation. In this paper, we propose a novel method to eliminate the need for repeated MPC evaluations, resulting in a NIZKAoK protocol for any NP relation that we call Diet. The proof size and time of Diet are asymptotically only polylogarithmic with respect to the size of the circuit C of the NP relation but are independent of the security parameter λ. Hence, both the proof size and time can be significantly reduced.
Moreover, Diet offers promising concrete efficiency for proving Learning With Errors (LWE) problems and its variants. Our solution provides significant advantages over other schemes in terms of both proof size and proof time, when considering both factors together. Specifically, Diet is a promising method for proving knowledge of secret keys for lattice-based key encapsulation mechanisms (KEMs) such as Frodo and Kyber, offering a practical solution to future post-quantum certificate management. For Kyber 512, our implementation achieves an online proof size of 83.65 kilobytes (KB) with a preprocessing overhead of 152.02KB. The implementation is highly efficient, with an online proof time of only 0.68 seconds and a preprocessing time of 0.81 seconds. Notably, our approach provides the first reported implementation of proving knowledge of secret keys for Kyber 512 using post-quantum primitives-based zero-knowledge proofs.
Verifiable Secret Sharing Simplified
Verifiable Secret Sharing (VSS) is a fundamental building block in cryptography. Despite its importance and extensive studies, existing VSS protocols are often complex and inefficient. Many of them do not support dual thresholds, are not publicly verifiable, or do not properly terminate in asynchronous networks. This paper presents a new and simple approach for designing VSS protocols in synchronous and asynchronous networks. Our VSS protocols are optimally fault-tolerant, i.e., they tolerate a and a fraction of malicious nodes in synchronous and asynchronous networks, respectively. They only require a public key infrastructure and the hardness of discrete logarithms. Our protocols support dual thresholds, and their transcripts are publicly verifiable. We implement our VSS protocols and evaluate them in a geo-distributed setting with up to 256 nodes. The evaluation demonstrates that our protocols offer asynchronous termination and public verifiability with performance that is comparable to that of existing schemes that lack these features. Compared to the existing schemes with similar guarantees, our approach lowers the bandwidth usage and latency by up to 90%.
Algorithm xxx: Evaluating a Boolean Polynomial on All Possible Inputs
Evaluating a Boolean polynomial on all possible inputs (i.e. building the truth table of the corresponding Boolean function) is a simple computational problem that sometimes appears inside broader applications, for instance in cryptanalysis or in the implementation of more sophisticated algorithms to solve Boolean polynomial systems.
Two techniques share the crown to perform this task: the “Fast Exhaustive Search” (FES) algorithm from 2010 (which is based on Gray Codes) and the space-efficient Moebius transform from 2021 (which is reminiscent of the FFT). Both require operations for a degree- Boolean polynomial on variables and operate mostly in-place, but have other slightly different characteristics. They both provide an efficient iterator over the full truth table.
This article describes BeanPolE (BoolEAN POLynomial Evaluation), a concise and flexible C library that implements both algorithms, as well as many other functions to deal with Boolean multivariate polynomials in dense representation.
Optimizing Liveness for Blockchain-Based Sealed-Bid Auctions in Rational Settings
Blockchain-based auction markets offer stronger fairness and transparency compared to their centralized counterparts. Deposits and sealed bid formats are usually applied to enhance security and privacy. However, to our best knowledge, the formal treatment of deposit-enabled sealed-bid auctions remains lacking in the cryptographic literature. To address this gap, we first propose a decentralized anonymous deposited-bidding (DADB) scheme, providing formal syntax and security definitions. Unlike existing approaches that rely on smart contracts, our construction utilizes a mainchain-sidechain structure that is also compatible with the extended UTXO model. This design further allows us to develop a consensus mechanism on the sidechain dedicated to securely recording bids for allocation. Specifically, we build atop an Algorand-style protocol and integrate a novel block qualification mechanism into the block selection. Consequently, we prove, from a game-theoretical perspective, that our design optimizes liveness latency for rational users who want to join the auction, even without explicit incentives (e.g., fees) for including bids. Finally, our implementation results demonstrate the potential performance degradation without the block qualification mechanism.
Fuzzy PSI via Oblivious Protocol Routing
In private set intersection (PSI), two parties who each hold sets of items can learn their intersection without revealing anything about their other items. Fuzzy PSI corresponds to a relaxed variant that reveals pairs of items which are ``close enough,'' with respect to some distance metric. In this paper we propose a new protocol framework for fuzzy PSI, compatible with arbitrary distance metrics. We then show how to efficiently instantiate our framework for , , and metrics, in a way that uses exclusively cheap symmetric-key operations. One notable feature of our protocol is that it has only logarithmic dependency on the distance threshold, whereas most other protocols have linear (or higher) dependency. For many reasonable combinations of parameters, our protocol has the lowest communication cost of existing fuzzy PSI protocols.
Breaking, Repairing and Enhancing XCBv2 into the Tweakable Enciphering Mode GEM
Tweakable enciphering modes (TEMs) provide security in a variety of storage and space-critical applications like disk and file-based encryption, and packet-based communication protocols, among others. XCB-AES (known as XCBv2) is specified in the IEEE 1619.2 standard for encryption of sector-oriented storage media and it comes with a proof of security for block-aligned input messages.
In this work, we demonstrate the and most efficient plaintext recovery attack on XCBv2. We show that XCBv2 is also for full block messages by recovering the plaintext (all except the final block) using minimal number of queries namely two. We demonstrate that our attack further applies to the HCI and MXCB TEMs, which follow a similar design approach to XCBv2.
Following the responsible disclosure process, we communicated the attack details to IEEE and the authors of XCB-AES. The authors have confirmed the validity of our attack on 02/09/2024.
Our next contribution is to strengthen the provable security of XCB-AES (claimed bits in queried blocks). We propose a new modular TEM called GEM which can be seen as a generalization of the Hash-CTR-Hash approach as used in XCB-style and HCTR-style TEMs. We are able to prove that GEM achieves full -bit security using -bit PRP/PRF.
We also give two concrete GEM instantiations: and , both of which are based on AES-128 and GHASH-256, and internally use variants of the CTR-based weak pseudorandom functions GCTR-3 and SoCTR, respectively. SoCTR uses AES-128 and GCTR-3 is based on -256. Our security proofs show that both and provide full -bit security. From applications perspective, addresses the need for reusing classical components, while enhances performance by leveraging a more modern primitive based on the AES/Deoxys round function.
Our implementation demonstrate competitive performance: For typical 4KiB sector size, 's performance is on par with AES -CTET+, yet achieving higher standard security guarantees. is on par with AES-CTET+ performance-wise while also maintaining higher security with standard components. Our GEM instances triple the security margin of XCB-AES and double that of HCTR2 at the cost of performance loss of only ( ) and ( ) for 4KiB messages.
Simplification Issues of An Authentication and Key Agreement Scheme for Smart Grid
Key agreement and public key encryption are two elementary cryptographic primitives, suitable for different scenarios. But their differences are still not familiar to some researchers. In this note, we show that the Safkhani et al.'s key agreement scheme [Peer-to-Peer Netw. Appl. 15(3), 1595-1616, 2022] is a public key encryption in disguise. We stress that the ultimate use of key agreement is to establish a shared key for some symmetric key encryption. We also present a simplification of the scheme by removing some repetitive computations. To the best of our knowledge, it is the first time to clarify the fundamental differences between the two primitives. The techniques developed in this note will be helpful for the future works on designing such schemes.
Efficient Quantum Pseudorandomness from Hamiltonian Phase States
Quantum pseudorandomness has found applications in many areas of quantum information, ranging from entanglement theory, to models of scrambling phenomena in chaotic quantum systems, and, more recently, in the foundations of quantum cryptography. Kretschmer (TQC '21) showed that both pseudorandom states and pseudorandom unitaries exist even in a world without classical one-way functions. To this day, however, all known constructions require classical cryptographic building blocks which are themselves synonymous with the existence of one-way functions, and which are also challenging to realize on realistic quantum hardware.
In this work, we seek to make progress on both of these fronts simultaneously---by decoupling quantum pseudorandomness from classical cryptography altogether.
We introduce a quantum hardness assumption called the \emph{Hamiltonian Phase State} ( ) problem, which is the task of decoding output states of a random instantaneous quantum polynomial-time (IQP) circuit. Hamiltonian phase states can be generated very efficiently using only Hadamard gates, single-qubit rotations and CNOT circuits. We show that the hardness of our problem reduces to a worst-case version of the problem, and we provide evidence that our assumption is plausibly fully quantum; meaning, it cannot be used to construct one-way functions.
We also show information-theoretic hardness when only few copies of are available by proving an approximate -design property of our ensemble.
Finally, we show that our assumption and its variants allow us to efficiently construct many pseudorandom quantum primitives, ranging from pseudorandom states, to quantum pseudoentanglement, to pseudorandom unitaries, and even primitives such as public-key encryption with quantum keys. Along the way, we analyze a natural iterative construction of pseudorandom unitaries which resembles a candidate of Ji, Liu, and Song (CRYPTO'18).
Bootstrapping Small Integers With CKKS
The native plaintexts of the Cheon-Kim-Kim-Song (CKKS) fully homomorphic encryption scheme are vectors of approximations to complex numbers. Drucker et al. [J. Cryptol.'24] have showed how to use CKKS to efficiently perform computations on bits and small bit-length integers, by relying on their canonical embeddings into the complex plane. For small bit-length integers, Chung et al. [IACR eprint'24] recently suggested to rather rely on an embedding into complex roots of unity, to gain numerical stability and efficiency. Both works use CKKS in a black-box manner.
Inspired by the design by Bae et al. [Eurocrypt'24] of a dedicated bootstrapping algorithm for ciphertexts encoding bits, we propose a CKKS bootstrapping algorithm, (small-integer bootstrapping), for ciphertexts encoding small bit-length integers.
For this purpose, we build upon the DM/CGGI-to-CKKS conversion algorithm from Boura et al. [J. Math. Cryptol.'20], to bootstrap canonically embedded integers to integers embedded as roots of unity. allows functional bootstrapping: it can evaluate an arbitrary function of its input while bootstrapping. It may also be used to batch-(functional-)bootstrap multiple DM/CGGI ciphertexts. For example, its amortized cost for evaluating an 8-bit look-up table on DM/CGGI ciphertexts is 3.75ms (single-thread CPU, 128-bit security).
We adapt to simultaneously bootstrap multiple CKKS ciphertexts for bits. The resulting algorithm (batch-bits bootstrapping) allows to decrease the amortized cost of a binary gate evaluation. Compared to Bae et al., it gives a 2.4x speed-up.
Quantum State Group Actions
Cryptographic group actions are a leading contender for post-quantum cryptography, and have also been used in the development of quantum cryptographic protocols. In this work, we explore quantum group actions, which consist of a group acting on a set of quantum states. We show the following results:
1. In certain settings, statistical (even query bounded) security is impossible, analogously to post-quantum classical group actions.
2. We construct quantum state group actions and prove that many computational problems that have been proposed by cryptographers hold it. Depending on the construction, our proofs are either unconditional, rely on LWE, or rely on the quantum random oracle model. While our analysis does not directly apply to classical group actions, we argue it gives at least a sanity check that there are no obvious flaws in the post-quantum assumptions made by cryptographers.
3. Our quantum state group action allows for unifying two existing quantum money schemes: those based on group actions, and those based on non-collapsing hashes. We also explain how they can unify classical and quantum key distribution.
RPO-M31 and XHash-M31: Efficient Hash Functions for Circle STARKs
We present two new arithmetization oriented hash functions based on RPO [Ashur, kindi, Meier, Szepieniec, Threadbare; ePrint 2022/1577] and XHash-12 [Ashur, Bhati, Kindi, Mahzoun, Perrin; ePrint 2023/1045] adapted for and ready to use in Circle STARKs [Habock, Levit, Papini; ePrint 2024/278].
On Constructing Pseudorandom Involutions: Feistel variants using a single round function
An involution is a permutation that is the inverse of itself. Involutions have attracted plenty attentions in cryptographic community due to their advantage regarding hardware implementations. In this paper, we reconsider constructing {\it pseudorandom involutions}. We demonstrate two constructions.
First, the 4-round Feistel network {\it using the same random function (Feistel-SF) in every round} is a pseudorandom involution. This shows the Feistel-SF construction still provides non-trivial cryptographic strength. To complement, we also show insecurity of 3-round Feistel-SF by exhibiting an attack.
Second, a ``mirrored'' variant of the Naor-Reingold construction with component reusing yields a pseudorandom involution.
Robust Multiparty Computation from Threshold Encryption Based on RLWE
We consider protocols for secure multi-party computation (MPC) built from FHE under honest majority, i.e., for players of which are corrupt, that are robust. Surprisingly there exists no robust threshold FHE scheme based on BFV to design such MPC protocols. Precisely, all existing methods for generating a common relinearization key can abort as soon as one player deviates. We address this issue, with a new relinearization key (adapted from [CDKS19, CCS'19]) which we show how to securely generate in parallel of the threshold encryption key, in the same broadcast. We thus obtain the first robust threshold BFV scheme, moreover using only one broadcast for the generation of keys instead of two previously.
Of independent interest, as an optional alternative, we propose the first threshold FHE decryption enabling simultaneously:
(i) robustness over asynchronous channels with honest majority; (ii) tolerating a power-of-small-prime ciphertext modulus, e.g., ; and (iii) secret shares of sizes quasi-independent of .
Efficient Boolean-to-Arithmetic Mask Conversion in Hardware
Masking schemes are key in thwarting side-channel attacks due to their robust theoretical foundation. Transitioning from Boolean to arithmetic (B2A) masking is a necessary step in various cryptography schemes, including hash functions, ARX-based ciphers, and lattice-based cryptography. While there exists a significant body of research focusing on B2A software implementations, studies pertaining to hardware implementations are quite limited, with the majority dedicated solely to creating efficient Boolean masked adders. In this paper, we present first- and second-order secure hardware implementations to perform B2A mask conversion efficiently without using masked adder structures. We first introduce a first-order secure low-latency gadget that executes a B2A2k in a single cycle. Furthermore, we propose a second-order secure B2A2k gadget that has a latency of only 4 clock cycles. Both gadgets are independent of the input word size k. We then show how these new primitives lead to improved B2Aq hardware implementations that perform a B2A mask conversion of integers modulo an arbitrary number. Our results show that our new gadgets outperform comparable solutions by more than a magnitude in terms of resource requirements and are at least 3 times faster in terms of latency and throughput. All gadgets have been formally verified and proven secure in the glitch-robust PINI security model. We additionally confirm the security of our gadgets on an FPGA platform using practical TVLA tests.
Masked Vector Sampling for HQC
Anticipating the advent of large quantum computers, NIST started a worldwide competition in 2016 aiming to define the next cryptographic standards. HQC is one of these post-quantum schemes still in contention, with three others already standardized. In 2022, Guo et al. introduced a timing attack that exploited an inconsistency in HQC rejection sampling function to recover its secret key in 866,000 calls to an oracle. The authors of HQC updated its specification by applying an algorithm to sample vectors in constant time. A masked implementation of this function was then proposed for BIKE but it is not directly applicable to HQC. In this paper we propose a masked specification-compliant version of HQC vector sampling function which relies, to our knowledge, on the first masked implementation of the Barrett reduction.
Consensus in the Presence of Overlapping Faults and Total Omission
Understanding the fault tolerance of Byzantine Agreement protocols is an important question in distributed computing. While the setting of Byzantine faults has been thoroughly explored in the literature, the (arguably more realistic) omission fault setting is far less studied. In this paper, we revisit the recent work of Loss and Stern who gave the first protocol in the mixed fault model tolerating Byzantine faults, send faults, and receive faults, when and omission faults do not overlap. We observe that their protocol makes no guarantees when omission faults can overlap, i.e., when parties can simultaneously have send and receive faults. We give the first protocol that overcomes this limitation and tolerates the same number of potentially overlapping faults. We then study, for the first time, the total omission setting where all parties can become omission faulty. This setting is motivated by real-world scenarios where every party may experience connectivity issues from time to time, yet agreement should still hold for the parties who manage to output values. We show the first agreement protocol in this setting with parameters and . On the other hand, we prove that there is no consensus protocol for the total omission setting which tolerates even a single overlapping omission fault, i.e., where and , or a broadcast protocol for and even without overlapping faults.
Sparrow: Space-Efficient zkSNARK for Data-Parallel Circuits and Applications to Zero-Knowledge Decision Trees
Space-efficient SNARKs aim to reduce the prover's space overhead which is one the main obstacles for deploying SNARKs in practice, as it can be prohibitively large (e.g., orders of magnitude larger than natively performing the computation). In this work, we propose Sparrow, a novel space-efficient zero-knowledge SNARK for data-parallel arithmetic circuits with two attractive features: (i) it is the first space-efficient scheme where, for a given field, the prover overhead increases with a multiplicative sublogarithmic factor as the circuit size increases, and (ii) compared to prior space-efficient SNARKs that work for arbitrary arithmetic circuits, it achieves prover space asymptotically smaller than the circuit size itself. Our key building block is a novel space-efficient sumcheck argument with improved prover time which may be of independent interest. Our experimental results for three use cases (arbitrary data parallel circuits, multiplication trees, batch SHA256 hashing) indicate Sparrow outperforms the prior state-of-the-art space-efficient SNARK for arithmetic circuits Gemini (Bootle et al., EUROCRYPT'22) by - in total prover space and - in prover time. We then use Sparrow to build zero-knowledge proofs of tree training and prediction, relying on its space efficiency to scale to large datasets and forests of multiple trees. Compared to a (non-space-efficient) optimal-time SNARK based on the GKR protocol, we observe prover space reduction of - for tree training while maintaining essentially the same prover and verifier times and proof size. Even more interestingly, our prover requires comparable space to natively performing the underlying computation. E.g., for a MB dataset, our prover only needs more space than the native computation.
End-to-End Encrypted Cloud Storage in the Wild: A Broken Ecosystem
End-to-end encrypted cloud storage offers a way for individuals and organisations to delegate their storage needs to a third-party, while keeping control of their data using cryptographic techniques.
We conduct a cryptographic analysis of various products in the ecosystem, showing that many providers fail to provide an adequate level of security. In particular, we provide an in-depth analysis of five end-to-end encrypted cloud storage systems, namely Sync, pCloud, Icedrive, Seafile, and Tresorit, in the setting of a malicious server. These companies cumulatively have over 22 million users and are major providers in the field. We unveil severe cryptographic vulnerabilities in four of them. Our attacks invalidate the marketing claims made by the providers of these systems, showing that a malicious server can, in some cases, inject files in the encrypted storage of users, tamper with file data, and even gain direct access to the content of the files. Many of our attacks affect multiple providers in the same way, revealing common failure patterns in independent cryptographic designs.
We conclude by discussing the significance of these patterns beyond the security of the specific providers.
Improved Meet-LWE Attack via Ternary Trees
The Learning with Errors (LWE) problem with its variants over structured lattices has been widely exploited in efficient post-quantum cryptosystems. Recently, May suggests the Meet-LWE attack, which poses a significant advancement in the line of work on the Meet-in-the-Middle approach to analyze LWE with ternary secrets.
In this work, we generalize and extend the idea of Meet-LWE by introducing ternary trees, which result in diverse representations of the secrets. More precisely, we split the secrets into three pieces with the same dimension and expand them into a ternary tree to leverage the increased representations to improve the overall attack complexity. We also suggest the matching criteria for the approximate matching of three lists via locality sensitive hash function accordingly. We carefully analyze and optimize the time and memory costs of our attack algorithm exploiting ternary trees, and compare them to those of the Meet-LWE attack. With asymptotic and non-asymptotic comparisons, we observe that our attack provides improved estimations for all parameter settings, including those of the practical post-quantum schemes, compared to the Meet-LWE attack. We also evaluate the security of the Round 2 candidates of the KpqC competition which aims to standardize post-quantum public key cryptosystems in the Republic of Korea and report that the estimated complexities for our attack applied to SMAUG-T are lower than the claimed for some of the recommended parameters.
Traitor Tracing without Trusted Authority from Registered Functional Encryption
Traitor-tracing systems allow identifying the users who contributed to building a rogue decoder in a broadcast environment. In a traditional traitor-tracing system, a key authority is responsible for generating the global public parameters and issuing secret keys to users. All security is lost if the \emph{key authority itself} is corrupt. This raises the question: Can we construct a traitor-tracing scheme, without a trusted authority?
In this work, we propose a new model for traitor-tracing systems where, instead of having a key authority, users could generate and register their own public keys. The public parameters are computed by aggregating all user public keys. Crucially, the aggregation process is \emph{public}, thus eliminating the need of any trusted authority. We present two new traitor-tracing systems in this model based on bilinear pairings. Our first scheme is proven adaptively secure in the generic group model. This scheme features a transparent setup, ciphertexts consisting of group elements, and a public tracing algorithm. Our second scheme supports a bounded collusion of traitors and is proven selectively secure in the standard model. Our main technical ingredients are new registered functional encryption (RFE) schemes for quadratic and linear functions which, prior to this work, were known only from indistinguishability obfuscation.
To substantiate the practicality of our approach, we evaluate the performance a proof of concept implementation. For a group of users, encryption and decryption take roughly 50ms and 4ms, respectively, whereas a ciphertext is of size 6.7KB.
Efficient Key-Switching for Word-Type FHE and GPU Acceleration
Speed efficiency, memory optimization, and quantum resistance are essential for safeguarding the performance and security of cloud computing environments. Fully Homomorphic Encryption (FHE) addresses this need by enabling computations on encrypted data without requiring decryption, thereby maintaining data privacy. Additionally, lattice-based FHE is quantum secure, providing defense against potential quantum computer attacks. However, the performance of current FHE schemes remains unsatisfactory, largely because of the length of the operands and the computational expense associated with several resource-intensive operations. Among these operations, key-switching is one of the most demanding processes because it involves complex arithmetic operations necessary to conduct computations in a larger cyclotomic ring.
In this research, we introduce a novel algorithm that achieves linear complexity in the Number Theoretic Transform (NTT) for key-switching. This algorithm offers efficiency comparable to the state-of-the-art while being significantly simpler and consumes less GPU memory. Notably, it reduces space consumption by up to 95\%, making it highly friendly for GPU memory. By optimizing GPU performance, our implementation achieves up to a 2.0 speedup compared to both the baseline approach and the current state-of-the-art methods. This algorithm effectively balances simplicity and performance, thereby enhancing cryptographic computations on modern hardware platforms and paving the way to more practical and efficient FHE implementations in cloud computing environments.
Glacius: Threshold Schnorr Signatures from DDH with Full Adaptive Security
Threshold signatures are one of the most important cryptographic primitives in distributed systems. The threshold Schnorr signature scheme, an efficient and pairing-free scheme, is a popular choice and is included in NIST's standards and recent call for threshold cryptography. Despite its importance, most threshold Schnorr signature schemes assume a static adversary in their security proof. A recent scheme proposed by Katsumata et al. (Crypto 2024) addresses this issue. However, it requires linear-sized signing keys and lacks the identifiable abort property, which makes it vulnerable to denial-of-service attacks. Other schemes with adaptive security either have reduced corruption thresholds or rely on non-standard assumptions such as the algebraic group model (AGM) or hardness of the algebraic one-more discrete logarithm (AOMDL) problem.
In this work, we present Glacius, the first threshold Schnorr signature scheme that overcomes all these issues. Glacius is adaptively secure based on the hardness of decisional Diffie-Hellman (DDH) in the random oracle model (ROM), and it supports a full corruption threshold , where is the total number of signers and is the signing threshold. Additionally, Glacius provides constant-sized signing keys and identifiable abort, meaning signers can detect misbehavior. We also give a formal game-based definition of identifiable abort, addressing certain subtle issues present in existing definitions, which may be of independent interest.
On the Tight Security of the Double Ratchet
The Signal Protocol is a two-party secure messaging protocol used in applications such as Signal, WhatsApp, Google Messages and Facebook Messenger and is used by billions daily. It consists of two core components, one of which is the Double Ratchet protocol that has been the subject of a line of work that aims to understand and formalise exactly what security it provides. Existing models capture strong guarantees including resilience to state exposure in both forward security (protecting past secrets) and post-compromise security (restoring security), adaptive state corruptions, message injections and out-of-order message delivery. Due to this complexity, prior work has failed to provide security guarantees that do not degrade in the number of interactions, even in the single-session setting.
Given the ubiquity of the Double Ratchet in practice, we explore tight security bounds for the Double Ratchet in the multi-session setting. To this end, we revisit the modelling of Alwen, Coretti and Dodis (EUROCRYPT 2019) who decompose the protocol into modular, abstract components, notably continuous key agreement (CKA) and forward-secure AEAD (FS-AEAD). To enable a tight security proof, we propose a CKA security model that provides one-way security under key checking attacks. We show that multi-session security of the Double Ratchet can be tightly reduced to the multi-session security of CKA and FS-AEAD, capturing the same strong security guarantees as Alwen et al.
Our result improves upon the bounds of Alwen et al. in the random oracle model. Even so, we are unable to provide a completely tight proof for the Double Ratchet based on standard Diffie-Hellman assumptions, and we conjecture it is not possible. We thus go a step further and analyse CKA based on key encapsulation mechanisms (KEMs). In contrast to previous works, our new analysis allows for tight constructions based on the DDH and post-quantum assumptions.
The Impact of Reversibility on Parallel Pebbling
The (parallel) classical black pebbling game is a helpful abstraction which allows us to analyze the resources (time, space, space-time, cumulative space) necessary to evaluate a function with a static data-dependency graph on a (parallel) computer. In particular, the parallel black pebbling game has been used as a tool to quantify the (in)security of Data-Independent Memory-Hard Functions (iMHFs). However, the classical black pebbling game is not suitable to analyze the cost of quantum preimage attack. Thus, Blocki et al. (TCC 2022) introduced the parallel reversible pebbling game as a tool to analyze resource requirements for a quantum computer. While there is an extensive line of work analyzing pebbling complexity in the (parallel) black pebbling game, comparatively little is known about the parallel reversible pebbling game. Our first result is a lower bound of on the reversible cumulative pebbling cost for a line graph on nodes. This yields a separation between classical and reversible pebbling costs demonstrating that the reversibility constraint can increase cumulative pebbling costs (and space-time costs) by a multiplicative factor of --- the classical pebbling cost (space-time or cumulative) for a line graph is just . On the positive side, we prove that any classical parallel pebbling can be transformed into a reversible pebbling strategy whilst increasing space-time (resp. cumulative memory) costs by a multiplicative factor of at most (resp. ). We also analyze the impact of the reversibility constraint on the cumulative pebbling cost of depth-robust and depth-reducible DAGs exploiting reversibility to improve constant factors in a prior lower bound of Alwen et al. (EUROCRYPT 2017). For depth-reducible DAGs we show that the state-of-the-art recursive pebbling techniques of Alwen et al. (EUROCRYPT 2017) can be converted into a recursive reversible pebbling attack without any asymptotic increases in pebbling costs. Finally, we extend a result of Blocki et al. (ITCS 2020) to show that it is Unique Games hard to approximate the reversible cumulative pebbling cost of a DAG to within any constant factor.
General Functional Bootstrapping using CKKS
The Ducas-Micciancio (DM/FHEW) and Chilotti-Gama-Georgieva-Izabachène (CGGI/TFHE) cryptosystems provide a general privacy-preserving computation capability. These fully homomorphic encryption (FHE) cryptosystems can evaluate an arbitrary function expressed as a general look-up table (LUT) via the method of functional bootstrapping (also known as programmable bootstrapping). The main limitation of DM/CGGI functional bootstrapping is its efficiency because this procedure has to bootstrap every encrypted number separately. A different bootstrapping approach, based on the Cheon-Kim-Kim-Song (CKKS) FHE scheme, can achieve much smaller amortized time due to its ability to bootstrap many thousands of numbers at once. However, CKKS does not currently provide a functional bootstrapping capability that can evaluate a general LUT. An open research question is whether such capability can be efficiently constructed. We give a positive answer to this question by proposing and implementing a general functional bootstrapping method based on CKKS-style bootstrapping. We devise a theoretical toolkit for evaluating an arbitrary function using the theory of trigonometric Hermite interpolations, which provides control over noise reduction during functional bootstrapping. Our experimental results for 8-bit LUT evaluation show that the proposed method achieves the amortized time of 0.75 milliseconds, which is three orders of magnitude faster than the DM/CGGI approach and 6.6x faster than (a more restrictive) amortized functional bootstrapping method based on the Brakerski/Fan-Vercauteren (BFV) FHE scheme.
A New Approach Towards Encrypted Data Sharing and Computation: Enhancing Efficiency Beyond MPC and Multi-Key FHE
In this paper, we introduce a novel approach to Multi-Key Fully Homomorphic Encryption (MK-FHE) that enhances both efficiency and security beyond the capabilities of traditional MK-FHE and MultiParty Computation (MPC) systems. Our method generates a unified key structure, enabling constant ciphertext size and constant execution time for encrypted computations, regardless of the number of participants involved. This approach addresses critical limitations such as ciphertext size expansion, noise accumulation, and the complexity of relinearization, which typically hinder scalability in multi-user environments. We also propose a new decryption method that simplifies decryption to a single information exchange, in contrast to traditional multi-key FHE systems that require information to be passed between all parties sequentially.
Additionally, it significantly enhances the scalability of MK-FHE systems, allowing seamless integration of additional participants without introducing performance overhead. Through theoretical analysis and practical implementation, we demonstrate the superiority of our approach in large-scale, collaborative encrypted computation scenarios, paving the way for more robust and efficient secure data processing frameworks. Further more, unlike the threshold based FHE schemes, the proposed system doesn’t require a centralised trusted third party to split and distribute the individual secret keys, instead each participant independently generates their own secret key, ensuring both security and decentralization.
Founding Quantum Cryptography on Quantum Advantage, or, Towards Cryptography from -Hardness
Recent oracle separations [Kretschmer, TQC'21, Kretschmer et. al., STOC'23] have raised the tantalizing possibility of building quantum cryptography from sources of hardness that persist even if the polynomial hierarchy collapses. We realize this possibility by building quantum bit commitments and secure computation from unrelativized, well-studied mathematical problems that are conjectured to be hard for -- such as approximating the permanents of complex Gaussian matrices, or approximating the output probabilities of random quantum circuits. Indeed, we show that as long as \any one of the conjectures underlying sampling-based quantum advantage (e.g., BosonSampling, Random Circuit Sampling, IQP, etc.) is true, quantum cryptography can be based on the extremely mild assumption that .
Our techniques uncover strong connections between the hardness of approximating the probabilities of outcomes of quantum processes, the existence of ``one-way'' state synthesis problems, and the existence of useful cryptographic primitives such as one-way puzzles and quantum bit commitments. Specifically, we prove that the following hardness assumptions are equivalent under reductions.
1. The hardness of approximating the probabilities of outcomes of certain efficiently sampleable distributions. That is, there exist quantumly efficiently sampleable distributions for which it is hard to approximate the probability assigned to a randomly chosen string in the support of the distribution (upto inverse polynomial multiplicative error).
2. The existence of one-way puzzles, where a quantum sampler outputs a pair of classical strings -- a puzzle and its key -- and where the hardness lies in finding the key corresponding to a random puzzle. These are known to imply quantum bit commitments [Khurana and Tomer, STOC'24].
3. The existence of state puzzles, or one-way state synthesis, where it is hard to synthesize a secret quantum state given a public classical identifier. These capture the hardness of search problems with quantum inputs (secrets) and classical outputs (challenges).
These are the first constructions of quantum cryptographic primitives (one-way puzzles, quantum bit commitments, state puzzles) from concrete, well-founded mathematical assumptions that do not imply the existence of classical cryptography.
Along the way, we also show that distributions that admit efficient quantum samplers but cannot be pseudo-deterministically efficiently sampled imply quantum commitments.
PAKE Combiners and Efficient Post-Quantum Instantiations
Much work has been done recently on developing password-authenticated key exchange (PAKE) mechanisms with post-quantum security. However, modern guidance recommends the use of hybrid schemes—schemes which rely on the combined hardness of a post-quantum assumption, e.g., learning with Errors (LWE), and a more traditional assumption, e.g., decisional Diffie-Hellman. To date, there is no known hybrid PAKE construction, let alone a general method for achieving such.
In this paper, we present two efficient PAKE combiners—algorithms that take two PAKEs satisfying mild assumptions, and output a third PAKE with combined security properties—and prove these combiners secure in the Universal Composability (UC) model. Our sequential combiner, instantiated with efficient existing PAKEs such as CPace (built on Diffie-Hellman-type assumptions) and CHIC[ML-KEM] (built on the Module LWE assumption), yields the first known hybrid PAKE.
Benchmarking Attacks on Learning with Errors
Lattice cryptography schemes based on the learning with errors (LWE) hardness assumption have been standardized by NIST for use as post-quantum cryptosystems, and by HomomorphicEncryption.org for encrypted compute on sensitive data. Thus, understanding their concrete security is critical. Most work on LWE security focuses on theoretical estimates of attack performance, which is important but may overlook attack nuances arising in real-world implementations. The sole existing concrete benchmarking effort, the Darmstadt Lattice Challenge, does not include benchmarks relevant to the standardized LWE parameter choices - such as small secret and small error distributions, and Ring-LWE (RLWE) and Module-LWE (MLWE) variants. To improve our understanding of concrete LWE security, we provide the first benchmarks for LWE secret recovery on standardized parameters, for small and low-weight (sparse) secrets. We evaluate four LWE attacks in these settings to serve as a baseline: the Search-LWE attacks uSVP, SALSA, and Cool & Cruel, and the Decision-LWE attack: Dual Hybrid Meet-in-the-Middle (MitM). We extend the SALSA and Cool & Cruel attacks in significant ways, and implement and scale up MitM attacks for the first time. For example, we recover hamming weight binomial secrets for KYBER ( ) parameters in hours with SALSA and Cool & Cruel, while we find that MitM can solve Decision-LWE instances for hamming weights up to in under an hour for Kyber parameters, while uSVP attacks do not recover any secrets after running for more than hours. We also compare concrete performance against theoretical estimates. Finally, we open source the code to enable future research.
Discrete Logarithm Factory
The Number Field Sieve and its variants are the best algorithms to solve the discrete logarithm problem in finite fields (except for the weak small characteristic case). The Factory variant accelerates the computation when several prime fields are targeted. This article adapts the Factory variant to non-prime finite fields of medium and large characteristic. A precomputation, solely dependent on an approximate finite field size and an extension degree, allows to efficiently compute discrete logarithms in a constant proportion of the finite fields of the given approximate size and extension degree. We combine this idea with two other variants of NFS, namely the tower and special variant. This combination improves the asymptotic complexity. We also notice that combining our approach with the MNFS variant would be an unnecessary complication as all the potential gain of MNFS is subsumed by our Factory variant anyway. Furthermore, we demonstrate how Chebotarev's density theorem allows to compute the density of finite fields that can be solved with a given precomputation. Finally, we provide experimental data in order to assess the practical reach of our approach.
Reframing and Extending the Random Probing Expandibility to Make Probing-Secure Compilers Tolerate a Constant Noise
In the context of circuits leaking the internal state due to hardware side-channels, the -random probing model has an adversary who can see the value of each wire with probability . In this model, for a fixed , it is possible to reach an arbitrary security by 'expanding' a stateless circuit via iterated compilation, reaching a security of with a polynomial size in .
An artifact of the existing proofs of the expansion is that the worst security is assumed for the input circuit. This means that a pre-compiled input circuit loses all the security guarantees of the first compilation. We reframe the expansion, and we prove it as a security reduction from the compiled circuit to the original one. Additionally, we extend it to support a broader range of encodings, and arbitrary probabilistic gates with an arbitrary number of inputs and outputs.
This allows us to prove the following statement: Given a stateless circuit on a field with characteristic , and given a -probing secure compiler for some integer , we can produce a circuit with security against any adversary that sees all wires with a constant SD-noise of , at the cost of an additional size factor .
Shaking up authenticated encryption
Authenticated encryption (AE) is a cryptographic mechanism that allows communicating parties to protect the confidentiality and integrity of messages exchanged over a public channel, provided they share a secret key. In this work, we present new AE schemes leveraging the SHA-3 standard functions SHAKE128 and SHAKE256, offering 128 and 256 bits of security strength, respectively, and their “Turbo” counterparts. They support session-based communication, where a ciphertext authenticates the sequence of messages since the start of the session. The chaining in the session allows decryption in segments, avoiding the need to buffer the entire deciphered cryptogram between decryption and validation. And, thanks to the collision resistance of (Turbo)SHAKE, they provide so-called CMT-4 committing security, meaning that they provide strong guarantees that a ciphertext uniquely binds to the key, plaintext and associated data. The AE schemes we propose have the unique combination of advantages that 1) their security is based on the security claim of SHAKE, that has received a large amount of public scrutiny, that 2) they make use of the standard KECCAK-p permutation that not only receives more and more dedicated hardware support, but also allows
competitive software-only implementations thanks to the TurboSHAKE instances, and that 3) they do not suffer from a 64-bit birthday bound like most AES-based schemes.
On Knowledge-Soundness of Plonk in ROM from Falsifiable Assumptions
Lipmaa, Parisella, and Siim [Eurocrypt, 2024] proved the extractability of the KZG polynomial commitment scheme under the falsifiable assumption ARSDH. They also showed that variants of real-world zk-SNARKs like Plonk can be made knowledge-sound in the random oracle model (ROM) under the ARSDH assumption. However, their approach did not consider various batching optimizations, resulting in their variant of Plonk having approximately times longer argument. Our contributions are: (1) We prove that several batch-opening protocols for KZG, used in modern zk-SNARKs, have computational special-soundness under the ARSDH assumption. (2) We prove that interactive Plonk has computational special-soundness under the ARSDH assumption and a new falsifiable assumption SplitRSDH. We also prove that two minor modifications of the interactive Plonk have computational special-soundness under only the ARSDH and a simpler variant of SplitRSDH. We define a new type-safe oracle framework of the AGMOS (AGM with oblivious sampling) and prove SplitRSDH is secure in it. The Fiat-Shamir transform can be applied to obtain non-interactive versions, which are secure in the ROM under the same assumptions.
Algebraic Equipage for Learning with Errors in Cyclic Division Algebras
In Noncommutative Ring Learning With Errors From Cyclic Algebras, a variant of Learning with Errors from cyclic division algebras, dubbed ‘Cyclic LWE', was developed, and security reductions similar to those known for the ring and module case were given, as well as a Regev-style encryption scheme. In this work, we make a number of improvements to that work: namely, we describe methods to increase the number of cryptographically useful division algebras, demonstrate the hardness of CLWE from ideal lattices obtained from non-maximal orders, and study Learning with Rounding in cyclic division algebras.
FINALLY: A Multi-Key FHE Scheme Based on NTRU and LWE
Multi-key fully homomorphic encryption (MKFHE), a generalization of
fully homomorphic encryption (FHE), enables a computation over encrypted data
under multiple keys. The first MKFHE schemes were based on the NTRU primitive,
however these early NTRU based FHE schemes were found to be insecure due to the
problem of over-stretched parameters. Recently, in the case of standard (non-multi
key) FHE a secure version, called FINAL, of NTRU has been found. In this work
we extend FINAL to an MKFHE scheme, this allows us to benefit from some of
the performance advantages provided by NTRU based primitives. Thus, our scheme
provides competitive performance against current state-of-the-art multi-key TFHE,
in particular reducing the computational complexity from quadratic to linear in the
number of keys.
Threshold Computation in the Head: Improved Framework for Post-Quantum Signatures and Zero-Knowledge Arguments
The MPC-in-the-Head paradigm is instrumental in building zero-knowledge proof systems and post-quantum signatures using techniques from secure multi-party computation. In this work, we extend and improve the recently proposed framework of MPC-in-the-Head based on threshold secret sharing, here called Threshold Computation in the Head. We first address some limitations of this framework, namely its overhead in the communication cost, its constraint on the number of parties and its degradation of the soundness. Our tweak of this framework makes it applicable to the previous MPCitH schemes (and in particular post-quantum signature candidates recently submitted to NIST) for which we obtain up to 50% timing improvements without degrading the signature size. Then we extend the TCitH framework to support quadratic (or higher degree) MPC round functions as well as packed secret sharing. We show the benefits of our extended framework for several applications. First we provide post-quantum zero-knowledge arguments for arithmetic circuits which improve the state of the art in the "small to medium size" regime. Then we apply our extended framework to derive improved variants of the MPCitH candidates submitted to NIST. For most of them, we save between 5% and 37% of the signature size. We further propose a generic way to build efficient post-quantum ring signatures from any one-way function. When applying our TCitH framework to this design to concrete one-way functions, the obtained scheme outperforms all the previous proposals in the state of the art. For instance, our scheme instantiated with MQ achieves sizes below 6 KB and timings around 10 ms for a ring of 4000 users. Finally, we provide exact arguments for lattice problems. Our scheme is competitive with state-of-the-art zero-knowledge lattice techniques and achieves proofs around 15 KB for LWE instances with similar security as Kyber512. We conclude our work by exhibiting strong connections between the TCitH framework and other proof systems (namely VOLE-in-the-Head and Ligero) which thus unifies different MPCitH-like proof systems under the same umbrella.
LeOPaRd: Towards Practical Post-Quantum Oblivious PRFs via Interactive Lattice Problems
In this work, we introduce a more efficient post-quantum oblivious PRF (OPRF) design, called LeOPaRd. Our proposal is round-optimal and supports verifiability and partial obliviousness, all of which are important for practical applications. The main technical novelty of our work is a new method for computing samples of MLWE (module learning with errors) in a two-party setting. To do this, we introduce a new family of interactive lattice problems, called interactive MLWE and rounding with re-use (iMLWER-RU). We rigorously study the hardness of iMLWER-RU and reduce it (under some natural idealized setting) to a more standard MLWE-like problem where the adversary is additionally given access to a randomized MLWE PRF oracle. We believe iMLWER-RU can be of independent interest for other interactive protocols.
LeOPaRd exploits this new iMLWER-RU assumption to realize a lattice-based OPRF design without relying on heavy machinery such as noise flooding and fully homomorphic encryption used in earlier works. LeOPaRd can feature around 136 KB total communication, compared to 300+ KB in earlier works. We also identify gaps in some existing constructions and models, and propose appropriate fixes.
An analysis of the Crossbred Algorithm for the MQ Problem
The Crossbred algorithm is currently the state-of-the-art method for solving overdetermined multivariate polynomial systems over . Since its publication in 2015, several record breaking implementations have been proposed and demonstrate the power of this hybrid approach. Despite these practical results, the complexity of this algorithm and the choice of optimal parameters for it are difficult open questions. In this paper, we prove a bivariate generating series for potentially admissible parameters of the Crossbred algorithm.
Related-Key Cryptanalysis of FUTURE
In Africacrypt 2022, Gupta \etal introduced a 64-bit lightweight \mds matrix-based \spn-like block cipher designed to encrypt data in a single clock cycle with minimal implementation cost, particularly when unrolled. While various attack models were discussed, the security of the cipher in the related-key setting was not addressed. In this work, we bridge this gap by conducting a security analysis of the cipher under related-key attacks using \milp(Mixed Integer Linear Programming)-based techniques. Our model enables a related-key distinguishing attack on 8 rounds of FUTURE, requiring plaintexts, \xor operations, and negligible memory. Additionally, we present a 10-round boomerang distinguisher with a probability of , leading to a distinguishing attack with plaintexts, \xor operations, and negligible memory. This result demonstrates a full break of the cipher’s 64-bit security in the related-key setting.
Efficient Maliciously Secure Oblivious Exponentiations
Oblivious Pseudorandom Functions (OPRFs) allow a client to evaluate a pseudorandom function (PRF) on her secret input based on a key that is held by a server. In the process, the client only learns the PRF output but not the key, while the server neither learns the input nor the output of the client. The arguably most popular OPRF is due to Naor, Pinkas and Reingold (Eurocrypt 2009). It is based on an Oblivious Exponentiation by the server, with passive security under the Decisional Diffie-Hellman assumption. In this work, we strengthen the security guarantees of the NPR OPRF by protecting it against active attacks of the server. We have implemented our solution and report on the performance.
Our main result is a new batch OPRF protocol which is secure against maliciously corrupted servers, but is essentially as efficient as the semi-honest solution. More precisely, the computation (and communication) overhead is a multiplicative factor as the batch size increases. The obvious solution using zero-knowledge proofs would have a constant factor overhead at best, which can be too expensive for certain deployments.
Our protocol relies on a novel version of the DDH problem, which we call the Oblivious Exponentiation Problem (OEP), and we give evidence for its hardness in the Generic Group model. We also present a variant of our maliciously secure protocol that does not rely on the OEP but nevertheless only has overhead over the known semi-honest protocol. Moreover, we show that our techniques can also be used to efficiently protect threshold blind BLS signing and threshold ElGamal decryption against malicious attackers.
Circuit Bootstrapping: Faster and Smaller
We present a novel circuit bootstrapping algorithm that outperforms the state-of-the-art TFHE method with 9.9× speedup and 15.6× key size reduction. These improvements can be attributed to two technical contributions. Firstly, we redesigned the circuit bootstrapping workflow to operate exclusively under the ring ciphertext type, which eliminates the need of conversion between LWE and RLWE ciphertexts. Secondly, we improve the LMKC+ blind rotation algorithm by reducing the number of automorphisms, then propose the first automorphism type multi-value functional bootstrapping. These automorphism-based techniques lead to further key size optimization, and are of independent interest besides circuit bootstrapping. Based our new circuit bootstrapping we can evaluate AES-128 in 26.2s (single thread), achieving 10.3× speedup compared with the state-of-the-art TFHE-based approach.
FHEW-like Leveled Homomorphic Evaluation: Refined Workflow and Polished Building Blocks
In FHEW-like cryptosystems, the leveled homomorphic evaluation (LHE) mode performs bootstrapping after circuit evaluation rather than after each gate.
The core procedure and the performance bottleneck are known as circuit bootstrapping (CBS).
This paper revisits the LHE mode by refining the workflow and proposing polished building blocks:
1. Algorithmic Enhancements
- We introduce an NTT-based CBS algorithm, patched from WWL+ [Eurocrypt24], achieving up to a 2.9 efficiency improvement.
- We present an FFT-based CBS that is 3.5 faster than the most efficient FFT-CBS implementation, with a key size reduction of 37.5 .
2. Refined Leveled Homomorphic Evaluation and Applications
- We propose the algorithm, which is 2.4 faster than the traditional CBS algorithm, enabling a flexible leveled evaluation.
- By applying these improvements to evaluate general look-up tables (LUTs), we can evaluate a 16-8 LUT in 311 ms, which is faster than the trivial FHE mode implementation.
- When applied to AES transciphering, our enhancements yield a 2.9 improvement in efficiency and a 31.6 reduction in key size compared to the state of the art, Thunderbird [TCHES24].
3. High-Precision LHE
- To handle multi-bit inputs, we propose a high-precision LHE framework. Compared to the WoP-PBS [JoC23], the compute efficiency (resp. key size) is improved by factors from 9.7 to 16.2 (resp. 3.4 to 4.4) according to the parameters.
On Wagner's k-Tree Algorithm Over Integers
The -Tree algorithm [Wagner 02] is a non-trivial algorithm for the average-case -SUM problem that has found widespread use in cryptanalysis. Its input consists of lists, each containing integers from a range of size . Wagner's original heuristic analysis suggested that this algorithm succeeds with constant probability if , and that in this case it runs in time . Subsequent rigorous analysis of the algorithm [Lyubashevsky 05, Shallue 08, Joux-Kippen-Loss 24] has shown that it succeeds with high probability if the input list sizes are significantly larger than this.
We present a broader rigorous analysis of the -Tree algorithm, showing upper and lower bounds on its success probability and complexity for any size of the input lists. Our results confirm Wagner's heuristic conclusions, and also give meaningful bounds for a wide range of list sizes that are not covered by existing analyses. We present analytical bounds that are asymptotically tight, as well as an efficient algorithm that computes (provably correct) bounds for a wide range of concrete parameter settings. We also do the same for the -Tree algorithm over . Finally, we present experimental evaluation of the tightness of our results.
Reckle Trees: Updatable Merkle Batch Proofs with Applications
We propose Reckle trees, a new vector commitment based on succinct RECursive arguments and MerKLE trees. Reckle trees' distinguishing feature is their support for succinct batch proofs that are updatable - enabling new applications in the blockchain setting where a proof needs to be computed and efficiently maintained over a moving stream of blocks. Our technical approach is based on embedding the computation of the batch hash inside the recursive Merkle verification via a hash-based accumulator called canonical hashing. Due to this embedding, our batch proofs can be updated in logarithmic time, whenever a Merkle leaf (belonging to the batch or not) changes, by maintaining a data structure that stores previously-computed recursive proofs. Assuming enough parallelism, our batch proofs are also computable in parallel time - independent of the size of the batch. As a natural extension of Reckle trees, we also introduce Reckle+ trees. Reckle+ trees provide updatable and succinct proofs for certain types of Map/Reduce computations. In this setting, a prover can commit to a memory and produce a succinct proof for a Map/Reduce computation over a subset of . The proof can be efficiently updated whenever or changes.
We present and experimentally evaluate two applications of Reckle+ trees, dynamic digest translation and updatable BLS aggregation. In dynamic digest translation we are maintaining a proof of equivalence between Merkle digests computed with different hash functions, e.g., one with a SNARK-friendly Poseidon and the other with a SNARK-unfriendly Keccak. In updatable BLS aggregation we maintain a proof for the correct aggregation of a -aggregate BLS key, derived from a -subset of a Merkle-committed set of individual BLS keys. Our evaluation using Plonky2 shows that Reckle trees and Reckle+ trees have small memory footprint, significantly outperform previous approaches in terms of updates ( to ) and verification ( to ) time, enable applications that were not possible before due to huge costs involved (Reckle trees are up to 200 times faster), and have similar aggregation performance with previous implementations of batch proofs.
Optimal Traitor Tracing from Pairings
We use pairings over elliptic curves to give a collusion-resistant traitor tracing scheme where the sizes of public keys, secret keys, and ciphertexts are independent of the number of users. Prior constructions from pairings had size . An additional consequence of our techniques is general result showing that attribute-based encryption for circuits generically implies optimal traitor tracing.
Secret Sharing with Snitching
We address the problem of detecting and punishing shareholder collusion in secret-sharing schemes. We do it in the recently proposed cryptographic model called individual cryptography (Dziembowski, Faust, and Lizurej, Crypto 2023), which assumes that there exist tasks that can be efficiently computed by a single machine but distributing this computation across multiple (mutually distrustful devices) is infeasible.
Within this model, we introduce a novel primitive called secret sharing with snitching (SSS), in which each attempt to illegally reconstruct the shared secret results in a proof that can be used to prove such misbehavior (and, e.g., financially penalize the cheater on a blockchain). This holds in a very strong sense, even if the shareholders attempt not to reconstruct the entire secret~ but only learn some partial information about it. Our notion also captures the attacks performed using multiparty computation protocols (MPCs), i.e., those where the malicious shareholders use MPCs to compute partial information on . The main idea of SSS is that any illegal reconstruction can be proven and punished, which suffices to discourage illegal secret reconstruction. Hence, our SSS scheme effectively prevents shareholders' collusion. We provide a basic definition of threshold ( -out-of- ) SSS. We then show how to construct it for , and later, we use this construction to build an SSS scheme for an arbitrary .
In order to prove the security of our construction, we introduce a generalization of the random oracle model (Bellare, Rogaway, CCS 1993), which allows modelling hash evaluations made inside MPC.
Blaze: Fast SNARKs from Interleaved RAA Codes
In this work we construct a new and highly efficient multilinear polynomial commitment scheme (MLPCS) over binary fields, which we call \emph{Blaze}. Polynomial commitment schemes allow a server to commit to a large polynomial and later decommit to its evaluations. Such schemes have emerged as a key component in recent efficient SNARK constructions.
Blaze has an extremely efficient prover, both asymptotically and concretely. The commitment is dominated by field additions (i.e., XORs) and one Merkle tree computation. The evaluation proof generation is dominated by additions and multiplications over the field. The verifier runs in time . Concretely, for sufficiently large message sizes, the prover is faster than all prior schemes except for Brakedown (Golovnev et al., Crypto 2023), but offers significantly smaller proofs than the latter.
The scheme is obtained by combining two ingredients:
1. Building on the code-switching technique (Ron-Zewi and Rothblum, JACM 2024), we show how to compose any error-correcting code together with an interactive oracle proof of proximity (IOPP) underlying existing MLPCS constructions, into a new MLPCS. The new MLPCS inherits its proving time from the code's encoding time, and its verification complexity from the underlying MLPCS. The composition is distinctive in that it is done purely on the information-theoretic side.
2. We apply the above methodology using an extremely efficient error-correcting code known as the Repeat-Accumulate-Accumulate (RAA) code. We give new asymptotic and concrete bounds, which demonstrate that (for sufficiently large message sizes) this code has a better encoding time vs. distance tradeoff than previous linear-time encodable codes that were considered in the literature.
Mild Asymmetric Message Franking: Illegal-Messages-Only and Retrospective Content Moderation
Many messaging platforms have integrated end-to-end (E2E) encryption into their services. This widespread adoption of E2E encryption has triggered a technical tension between user privacy and illegal content moderation. The existing solutions either support only unframeability or deniability, or they are prone to abuse (the moderator can perform content moderation for all messages, whether illegal or not), or they lack mechanisms for retrospective content moderation.
To address the above issues, we introduce a new primitive called \emph{mild asymmetric message franking} (MAMF) to establish illegal-messages-only and retrospective content moderation for messaging systems, supporting unframeability and deniability simultaneously. We provide a framework to construct MAMF, leveraging two new building blocks, which might be of independent interest.
A reduction from Hawk to the principal ideal problem in a quaternion algebra
In this article we present a non-uniform reduction from rank-2 module-LIP over Complex Multiplication fields, to a variant of the Principal Ideal Problem, in some fitting quaternion algebra. This reduction is classical deterministic polynomial-time in the size of the inputs. The quaternion algebra in which we need to solve the variant of the principal ideal problem depends on the parameters of the module-LIP problem, but not on the problem's instance. Our reduction requires the knowledge of some special elements of this quaternion algebras, which is why it is non-uniform.
In some particular cases, these elements can be computed in polynomial time, making the reduction uniform. This is the case for the Hawk signature scheme: we show that breaking Hawk is no harder than solving a variant of the principal ideal problem in a fixed quaternion algebra (and this reduction is uniform).
How to Recover the Full Plaintext of XCB
XCB, a tweakable enciphering mode, is part of IEEE Std. 1619.2 for shared storage media. We show that all versions of XCB are not secure through three plaintext recovery attacks. A key observation is that XCB behaves like an LRW1-type tweakable block cipher for single-block messages, which lacks CCA security. The first attack targets one-block XCB, using three queries to recover the plaintext. The second one requires four queries to recover the plaintext that excludes one block. The last one requires seven queries to recover the full plaintext. The first attack applies to any scheme that follows the XCB structure, whereas the latter two attacks work on all versions of XCB, exploiting the separable property of the underlying universal hash function. We also discuss the impact of these vulnerabilities on XCB-based applications, such as disk encryption, nonce-based encryption, deterministic authenticated encryption and robust authenticated encryption, highlighting the risks due to XCB's failure to achieve STPRP security. To address these flaws, we propose the XCB* structure, an improved version of XCB that adds only two XOR operations. We prove that XCB* is STPRP-secure when using AXU hash functions, SPRPs, and a secure IV-based stream cipher.
Incompressible Functional Encryption
Incompressible encryption (Dziembowski, Crypto'06; Guan, Wichs, Zhandry, Eurocrypt'22) protects from attackers that learn the entire decryption key, but cannot store the full ciphertext. In incompressible encryption, the attacker must try to compress a ciphertext within pre-specified memory bound before receiving the secret key.
In this work, we generalize the notion of incompressibility to functional encryption. In incompressible functional encryption, the adversary can corrupt non-distinguishing keys at any point, but receives the distinguishing keys only after compressing the ciphertext to within bits. An important efficiency measure for incompressible encryption is the ciphertext-rate ( i.e., ). We give many new results for incompressible functional encryption for circuits, from minimal assumption of (non-incompressible) functional encryption, with
1. -rate- and short secret keys,
2. -rate- and large secret keys.
Along the way, we also give a new incompressible attribute-based encryption for circuits from standard assumptions, with -rate- and short secret keys. Our results achieve optimal efficiency, as incompressible attribute-based/functional encryption with -rate- as well as short secret keys has strong barriers for provable security from standard assumptions. Moreover, our assumptions are minimal as incompressible attribute-based/functional encryption are strictly stronger than their non-incompressible counterparts.
Last updated: 2024-10-09
Fully Parallel, One-Cycle Random Shuffling for Efficient Countermeasure against Side Channel Attack and its Complexity Verification.
Hiding countermeasures are the most widely utilized techniques for thwarting side-channel attacks, and their significance has been further emphasized with the advent of Post Quantum Cryptography (PQC) algorithms, owing to the extensive use of vector operations. Commonly, the Fisher-Yates algorithm is adopted in hiding countermeasures with permuted operation for its security and efficiency in implementation, yet the inherently sequential nature of the algorithm imposes limitations on hardware acceleration. In this work, we propose a novel method named Addition Round Rotation ARR, which can introduce a time-area trade-off with block-based permutation. Our findings indicate that this approach can achieve a permutation complexity level commensurate with or exceeding in a single clock cycle while maintaining substantial resistance against second-order analysis. To substantiate the security of our proposed method, we introduce a new validation technique --Identity Verification. This technique allows theoretical validation of the proposed algorithm's security and is consistent with the experimental results. Finally, we introduce an actual hardware design and provide the implementation results on Application-Specific Integrated Circuit (ASIC). The measured performance demonstrates that our proposal fully supports the practical applicability.