Papers updated in last 7 days (61 results)
Another Lattice Attack Against an RSA-like Cryptosystem
Let $N=pq$ be the product of two balanced prime numbers $p$ and $q$. In 2015, Roman'kov introduced an interesting RSA-like cryptosystem that, unlike the classical RSA key equation $ed - k (p-1)(q-1) = 1$, uses the key equation $ed - k r = 1$, where $r | p-1$ and is a large prime number. In this paper, we study if small private key attacks based on lattices can be applied to Roman'kov's cryptosystem. More precisely, we argue that such attacks do not appear to be applicable to this scheme without substantial adaptations.
Revisiting Leakage-Resilient MACs and Succinctly-Committing AEAD: More Applications of Pseudo-Random Injections
Pseudo-Random Injections (PRIs) have been used in several applications in symmetric-key cryptography, such as in the idealization of Authenticated Encryption with Associated Data (AEAD) schemes, building robust AEAD, and, recently, in converting a committing AEAD scheme into a succinctly committing AEAD scheme. In Crypto 2024, Bellare and Hoang showed that if an AEAD scheme is already committing, it can be transformed into a succinctly committing scheme by encrypting part of the plaintext using a PRI. In this paper, we revisit the applications of PRIs in building Message Authentication Codes (MACs) and AEAD schemes. First, we look at some of the properties and definitions of PRIs, such as collision resistance and unforgeability when used as a MAC with small plaintext space, under different leakage models. Next, we show how they can be combined with collision-resistant hash functions to build a MAC for long plaintexts, offering flexible security depending on how the PRI and equality check are implemented. If both the PRI and equality check are leak-free, the MAC provides almost optimal security, but the security only degrades a little if the equality check is only leakage-resilient (rather than leak-free). If the equality check has unbounded leakage, the security drops to a baseline security, rather than being completely insecure. Next, we show how to use PRIs to build a succinctly committing online AEAD scheme dubbed as scoAEfrom scratch that achieves succinct CMT-4 security, privacy, and Ciphertext Integrity with Misuse and Leakage (CIML) security. Last but not least, we show how to build a succinct nonce Misuse-Resistant (MRAE) AEAD scheme, dubbed as scMRAE. The construction combines the SIV paradigm with PRI-based encryption (e.g. the Encode-then-Encipher (EtE) framework).
Light Clients for Lazy Blockchains
Lazy blockchains decouple consensus from transaction verification and execution to increase throughput. Although they can contain invalid transactions (e.g., double spends) as a result, these can easily be filtered out by full nodes that check if there have been previous conflicting transactions. However, creating light (SPV) clients that do not see the whole transaction history becomes a challenge: A record of a transaction on the chain does not necessarily entail transaction confirmation. In this paper, we devise a protocol that enables the creation of efficient light clients for lazy blockchains. The number of interaction rounds and the communication complexity of our protocol are logarithmic in the blockchain execution time. Our construction is based on a bisection game that traverses the Merkle tree containing the ledger of all - valid or invalid - transactions. We prove that our proof system is succinct, complete and sound, and empirically demonstrate the feasibility of our scheme.
Discrete Gaussians Modulo Sub-Lattices: New Leftover Hash Lemmas for Discrete Gaussians
The Leftover Hash Lemma (LHL) is a powerful tool for extracting randomness from an entropic distribution, with numerous applications in cryptography. LHLs for discrete Gaussians have been explored in both integer settings by Gentry et al. (GPV, STOC'08) and algebraic ring settings by Lyubashevsky et al. (LPR, Eurocrypt'13). However, the existing LHLs for discrete Gaussians have two main limitations: they require the Gaussian parameter to be larger than certain smoothing parameters, and they cannot handle cases where fixed and arbitrary information is leaked.
In this work, we present new LHLs for discrete Gaussians in both integer and ring settings. Our results show that the Gaussian parameter can be improved by a factor of $\omega(\sqrt{\log\lambda})$ and $O(\sqrt{N})$ compared to the regularity lemmas of GPV and LPR, respectively, under similar parameter choices such as the dimension and ring. Furthermore, our new LHLs can be applied to leaked discrete Gaussians, and the result can be used to establish asymptotic hardness of the extended MLWE assumptions, addressing an open question in recent works by Lyubashevsky et al. (LNP, Crypto'22). Our central techniques involve new fine-grained analyses of the min-entropy in discrete Gaussians modulo sublattices and should be of interest.
Finding Practical Parameters for Isogeny-based Cryptography
Isogeny-based schemes often come with special requirements on the field of definition of the involved elliptic curves. For instance, the efficiency of SQIsign, a promising candidate in the NIST signature standardisation process, requires a large power of two and a large smooth integer $T$ to divide $p^2-1$ for its prime parameter $p$.
We present two new methods that combine previous techniques for finding suitable primes: sieve-and-boost and XGCD-and-boost. We use these methods to find primes for the NIST submission of SQIsign. Furthermore, we show that our methods are flexible and can be adapted to find suitable parameters for other isogeny-based schemes such as AprèsSQI or POKE. For all three schemes, the parameters we present offer the best performance among all parameters proposed in the literature.
Another Walk for Monchi
Monchi is a new protocol aimed at privacy-preserving biometric identification. It begins with scores computation in the encrypted domain thanks to homomorphic encryption and ends with comparisons of these scores to a given threshold with function secret sharing. We here study the integration in that context of scores computation techniques recently introduced by Bassit et al. that eliminate homomorphic multiplications by replacing them by lookup tables. First, we extend this lookup tables biometric recognition solution by adding the use of function secret sharing for the final comparison of scores. Then, we introduce a two-party computation of the scores with lookup tables which fits nicely together with the function secret sharing scores comparison. Our solutions accommodate well with the flight boarding use case introduced by Monchi.
A Tool for Fast and Secure LWE Parameter Selection: the FHE case
The field of fully homomorphic encryption (FHE) has seen many theoretical and computational advances in recent years, bringing the technology closer to practicality than ever before. For this reason, practitioners in related fields, such as machine learning, are increasingly interested in using FHE to provide privacy to their applications.
Despite this progress, selecting secure and efficient parameters for FHE remains a complex and challenging task due to the intricate interdependencies between parameters. In this work, we address this issue by providing a rigorous theoretical foundation for parameter selection for any LWE-based schemes, with a specific focus on FHE. Our approach starts with an in-depth analysis of lattice attacks on the LWE problem, deriving precise expressions for the most effective ones. Building on this, we introduce closed-form formulas that establish the relationships among the LWE parameters.
In addition, we introduce a numerical method to enable the accurate selection of any configurable parameter to meet a desired security level.
Finally, we use our results to build a practical and efficient tool for researchers and practitioners deploying FHE in real-world applications, ensuring that our approach is both rigorous and accessible.
Shardora: Towards Scaling Blockchain Sharding via Unleashing Parallelism
Sharding emerges as a promising solution to enhance blockchain scalability. However, it faces two critical limitations during shard reconfiguration: (1) the TPS-Degradation issue, arising from ledger synchronization conflicts during transaction processing, and (2) the Zero-TPS issue, caused by disruptions in transaction processing due to key negotiation. To this end, we propose Shardora, a blockchain sharding system for scaling blockchain by unleashing parallelism. In Shardora, we implement two essential mechanisms: (1) A parallelized dual committee framework with a reputation mechanism to mitigate the TPS-Degradation issue while ensuring system security. (2) A parallelized key pre-negotiation mechanism with a secret-reuse strategy to avoid the Zero-TPS issue while maintaining a continuously high TPS. We prove that Shardora offers theory-guaranteed security. We implement a prototype of Shardora and deploy it on Alibaba Cloud. Experimental results demonstrate that Shardora addresses the limitations by significantly reducing the overhead of both ledger synchronization and key negotiation, which outperforms state-of-the-art sharding schemes by at least 90%. In addition, Shardora shows its superior performance in terms of throughput and latency, achieving a peak throughput of 8300 TPS on a single shard with 600 nodes under LAN conditions. The code of Shardora is publicly available on GitHub.
PerfOMR: Oblivious Message Retrieval with Reduced Communication and Computation
Anonymous message delivery, as in privacy-preserving blockchain and private messaging applications, needs to protect recipient metadata: eavesdroppers should not be able to link messages to their recipients. This raises the question: how can untrusted servers assist in delivering the pertinent messages to each recipient, without learning which messages are addressed to whom?
Recent work constructed Oblivious Message Retrieval (OMR) protocols that outsource the message detection and retrieval in a privacy-preserving way, using homomorphic encryption. Their construction exhibits significant costs in computation per message scanned (${\sim}0.1$ second), as well as in the size of the associated messages (${\sim}1$kB overhead) and public keys (${\sim}132$kB).
This work constructs more efficient OMR schemes, by replacing the LWE-based clue encryption of prior works with a Ring-LWE variant, and utilizing the resulting flexibility to improve several components of the scheme. We thus devise, analyze, and benchmark two protocols:
The first protocol focuses on improving the detector runtime, using a new retrieval circuit that can be homomorphically evaluated $13.8$x faster than the prior work.
The second protocol focuses on reducing the communication costs, by designing a different homomorphic decryption circuit that allows the parameter of the Ring-LWE encryption to be set such that the public key size is about $235$x smaller than the prior work, and the message size is roughly $1.6$x smaller. The runtime of this second construction is ${\sim}40.0$ms per message, still more than $2.5$x faster than prior works.
A non-comparison oblivious sort and its application to private k-NN
This paper introduces a novel adaptation of counting sort that enables sorting of encrypted data using Fully Homomorphic Encryption (FHE). Our approach represents the first known sorting algorithm for encrypted data that does not rely on comparisons. The implementation leverages some basic operations on TFHE's Look-Up-Tables (LUT). We have integrated these operations into RevoLUT, a comprehensive open-source library built upon tfhe-rs, which can be of independent interest for oblivious algorithms. We demonstrate the effectiveness of our Blind Counting Sort algorithm by developing a top-$k$ selection algorithm and applying it to privacy-preserving $k$-Nearest Neighbors classification. This proves to be approximately 5x faster than current state-of-the-art methods.
A notion on S-boxes for a partial resistance to some integral attacks
In two recent papers, we introduced and studied the notion of $k$th-order sum-freedom of a vectorial function $F:\mathbb F_2^n\to \mathbb F_2^m$. This notion generalizes that of almost perfect nonlinearity (which corresponds to $k=2$) and has some relation with the resistance to integral attacks of those block ciphers using $F$ as a substitution box (S-box), by preventing the propagation of the division property of $k$-dimensional affine spaces. In the present paper, we show that this notion, which is rarely satisfied by vectorial functions, can be weakened while retaining the property that the S-boxes do not propagate the division property of $k$-dimensional affine spaces. This leads us to the property that we name $k$th-order $t$-degree-sum-freedom, whose strength decreases when $t$ increases, and which coincides with $k$th-order sum-freedom when $t=1$. The condition for $k$th-order $t$-degree-sum-freedom is that, for every $k$-dimensional affine space $A$, there exists a non-negative integer $j$ of 2-weight at most $t$ such that $\sum_{x\in A}(F(x))^j\neq 0$. We show, for a general $k$th-order $t$-degree-sum-free function $F$, that $t$ can always be taken smaller than or equal to $\min(k,m)$ under some reasonable condition on $F$, and that it is larger than or equal to $\frac k{\deg(F)}$, where $\deg(F)$ is the algebraic degree of $F$. We study examples for $k=2$ (case in which $t=1$ corresponds to APNness) showing that finding $j$ of 2-weight 2 can be challenging, and we begin the study of power functions, and in particular, of the multiplicative inverse function (used as S-box in the AES), for which we extend to $k$th-order $t$-degree-sum-freedom the result that it is $k$th-order sum-free if and only if it is $(n-k)$th-order sum-free. We begin the study of the cases of $k\in \{2,3,n-3,n-2,n-1,n\}$.
Preservation of Speculative Constant-Time by Compilation
Compilers often weaken or even discard software-based countermeasures commonly used to protect programs against side-channel attacks; worse, they may also introduce vulnerabilities that attackers can exploit. The solution to this problem is to develop compilers that preserve such countermeasures. Prior work establishes that (a mildly modified version of) the CompCert and Jasmin formally verified compilers preserve constant-time, an information flow policy that ensures that programs are protected against timing side-channel attacks. However, nothing is known about preservation of speculative constant-time, a strengthening of the constant-time policy that ensures that programs are protected against Spectre-v1 attacks. We first show that preservation of speculative constant-time fails in practice by providing examples of secure programs whose compilation is not speculative constant-time using GCC (GCC -O0 and GCC -O1) and Jasmin. Then, we define a proof-of-concept compiler that distills some of the critical passes of the Jasmin compiler and use the Coq proof assistant to prove that it preserves speculative constant-time. Finally, we patch the Jasmin speculative constant-time type checker and demonstrate that all cryptographic implementations written in Jasmin can be fixed with minimal impact.
Extended Diffie-Hellman Encryption for Secure and Efficient Real-Time Beacon Notifications
Every computing paradigm involving communication requires new security protocols employing cryptography. For example, the Internet gave rise to TLS/SSL, and Mobile Computing gave rise to End to End Encryption protocols. In this paper, we address an emerging IoT paradigm involving beacons attached to things and security protocols associated with this new configuration.
Specifically, we address the ``beacon notification problem,'' a critical IoT paradigm aims at providing secure and efficient real-time notifications from beacons to their owners. Since the beacon notification problem has not yet been formally defined, we begin by inspecting natural requirements based on the operational setting and establishing correctness, security, and privacy definitions through the use of cryptographic games.
To resolve this problem, we propose a novel cryptographic tool we call XDHIES, which is a considerable extension of available Diffie-Hellman encryption schemes. We then show a new notification protocol built upon XDHIES and we prove that this cryptographic protocol is secure and private and successfully meets all the above problem's requirements.
WHIR: Reed–Solomon Proximity Testing with Super-Fast Verification
We introduce WHIR, a new IOP of proximity that offers small query complexity and exceptionally fast verification time. The WHIR verifier typically runs in a few hundred microseconds, whereas other verifiers in the literature require several milliseconds (if not much more). This significantly improves the state of the art in verifier time for hash-based SNARGs (and beyond).
Crucially, WHIR is an IOP of proximity for constrained Reed–Solomon codes, which can express a rich class of queries to multilinear polynomials and to univariate polynomials. In particular, WHIR serves as a direct replacement for protocols like FRI, STIR, BaseFold, and others. Leveraging the rich queries supported by WHIR and a new compiler for multilinear polynomial IOPs, we obtain a highly efficient SNARG for generalized R1CS.
As a comparison point, our techniques also yield state-of-the-art constructions of hash-based (non-interactive) polynomial commitment schemes for both univariate and multivariate polynomials (since sumcheck queries naturally express polynomial evaluations). For example, if we use WHIR to construct a polynomial commitment scheme for degree 222, with 100 bits of security, then the time to commit and open is 1.2 seconds, the sender communicates 63 KiB to the receiver, and the opening verification time is 360 microseconds.
High Speed High Assurance implementations of Mutivariate Quadratic based Signatures
In this poster, we present a Jasmin implementation of Mayo2, a multivariate quadratic(MQ) based signature scheme. Mayo overcomes the disadvantage of the Unbalanced oil and vinegar(UOV) scheme by whipping the UOV map to produce public keys of sizes comparable to ML-DSA. Our Jasmin implementation of Mayo2 takes 930 μs for key-gen, 3206 μs for sign, 480 μs for verify based on the average of 1,00,000 runs of the implementation on a 2.25GHz x86 64 processor with 256 GB RAM. To this end, we have a multivariate quadratic based signature implementation that is amenable for verification of constant-time, correctness, proof of equivalence properties using Easycrypt. Subsequently, the results of this endeavor can be extended for other MQ based schemes including UOV.
A Comprehensive Survey on Hardware-Software co-Protection against Invasive, Non-Invasive and Interactive Security Threats
In the face of escalating security threats in modern
computing systems, there is an urgent need for comprehensive
defense mechanisms that can effectively mitigate invasive, noninvasive and interactive security vulnerabilities in hardware
and software domains. Individually, hardware and software
weaknesses and probable remedies have been practiced but
protecting a combined system has not yet been discussed in
detail. This survey paper provides a comprehensive overview of
the emerging field of Hardware-Software co-Protection against
Invasive and Non-Invasive Security Threats. We systematically
review state-of-the-art research and developments in hardware
and software security techniques, focusing on their integration to
create synergistic defense mechanisms. The survey covers a wide
range of security threats, including physical attacks, side-channel
attacks, and malware exploits, and explores the diverse strategies
employed to counter them. Our survey meticulously examines the
landscape of security vulnerabilities, encompassing both physical
and software-based attack vectors, and explores the intricate
interplay between hardware and software defenses in mitigating
these threats. Furthermore, we discuss the challenges and opportunities associated with Hardware-Software co-Protection and
identify future research directions to advance the field. Through
this survey, we aim to provide researchers, practitioners, and
policymakers with valuable insights into the latest advancements
and best practices for defending against complex security threats
in modern computing environments.
Sailfish: Towards Improving the Latency of DAG-based BFT
Directed Acyclic Graph (DAG) based BFT protocols balance consensus efforts across different parties and maintain high throughput even when some designated parties fail. However, existing DAG-based BFT protocols exhibit long latency to commit decisions, primarily because they have a \emph{leader} every 2 or more ``rounds''. Recent works, such as Shoal (FC'23) and Mysticeti, have deemed supporting a leader vertex in each round particularly difficult, if not impossible. Consequently, even under honest leaders, these protocols require high latency (or communication complexity) to commit the proposal submitted by the leader (leader vertex) and additional latency to commit other proposals (non-leader vertices).
In this work, we present Sailfish, the first DAG-based BFT that supports a leader vertex in each round. Under honest leaders, Sailfish maintains a commit latency of one reliable broadcast (RBC) round plus $1\delta$ to commit the leader vertex (where $\delta$ is the actual transmission latency of a message) and only an additional RBC round to commit non-leader vertices. We also extend Sailfish to Multi-leader Sailfish, which facilitates multiple leaders within a single round and commits all leader vertices in a round with a latency of one RBC round plus $1\delta$. Our experimental evaluation demonstrates that our protocols introduce significantly lower latency overhead compared to existing DAG-based protocols, with similar throughput.
Inflation-Tracking Proof-of-Work Crypto-Currencies
We show that Bitcoin and other existing egalitarian crypto-currencies are unstable as store-of-value as they fail to track inflation of local currencies closely, and the price dynamic is purely driven by speculation. In the case of Bitcoin, we show that instead of price being based on cost of mining Bitcoin, it is the cost of mining that rapidly converges to the current price of Bitcoin. Based on rational expectations equilibrium, we argue that if the coins awarded during mining are increased in proportion to increase in difficulty of the underlying cryptographic puzzle, then the price of the coin is likely to track inflation of local currencies closely over medium to long term. However, since Moore's law as well as targeted hardware design can lead to computational cost deflation, we suggest a hyper-geometric tapering, instead of a geometric tapering, of the mining award over time. This also handles bootstrapping interest in the crypto-currency.
Shifting our knowledge of MQ-Sign security
Unbalanced Oil and Vinegar (UOV) is one of the oldest, simplest, and most studied ad-hoc multivariate signature schemes. UOV signature schemes are attractive because they have very small signatures and fast verification. On the downside, they have large public and secret keys. As a result, variations of the traditional UOV scheme are usually developed with the goal to reduce the key sizes. Seven variants of UOV were submitted to the additional call for digital signatures by NIST, prior to which, a variant named MQ-Sign was submitted to the (South) Korean post-quantum cryptography competition (KpqC). MQ-Sign is currently competing in the second round of KpqC with two variants. One of the variants corresponds to the classic description of UOV with certain implementation and parameter choices. In the other variant, called MQ-Sign-LR, a part of the central map is constructed from row shifts of a single matrix. This design makes for smaller secret keys, and in the case where the equivalent keys optimization is used, it also leads to smaller public keys. However, we show in this work that the polynomial systems arising from an algebraic attack have a specific structure that can be exploited. Specifically, we are able to find preimages for $d$-periodic targets under the public map with a probability of $63\%$ for all security levels. The complexity of finding these preimages, as well as the fraction of $d$-periodic target increases with $d$ and hence provides a trade-off. We show that for all security levels one can choose $d=\frac{v}{2}$, for $v$ the number of vinegar variables, and reduce the security claim. Our experiments show practical running times for lower $d$ ranging from 0.06 seconds to 32 hours.
Design and Implementation of a Fast, Platform-Adaptive, AIS-20/31 Compliant PLL-Based True Random Number Generator on a Zynq 7020 SoC FPGA
Phase-locked loops (PLLs) integrated within field-programmable gate arrays (FPGAs) or System-on-Chip FPGAs (SoCs) represent a promising approach for generating random numbers. Their widespread deployment, isolated functionality within these devices, and robust entropy, as demonstrated in prior studies, position PLL-based true random number generators (PLL-TRNGs) as highly viable solutions for this purpose. This study explicitly examines PLL-TRNG implementations using the ZC702 Rev1.1 Evaluation Board featuring the Zynq 7020 SoC from Xilinx, utilizing a configuration involving three such boards for experimental validation. Parameters governing the PLL-TRNG are optimized using a backtracking algorithm. Additionally, a novel methodology is proposed to enhance the rate of random data bit generation while preserving entropy characteristics. Performance metrics are rigorously evaluated against the criteria set by the German Federal Office for Information Security (BSI) AIS-20/31 Tests, accompanied by detailed descriptions of the implementation process. Furthermore, the suitability of our PLL-TRNG designs, attributed to their low resource utilization, is demonstrated.
Succinctly Verifiable Computation over Additively-Homomorphically Encrypted Data with Applications to Privacy-Preserving Blueprints
With additively homomorphic encryption (AHE), one can compute, from input ciphertexts $\mathsf{Enc}(x_1),\ldots,\mathsf{Enc}(x_n)$, and additional inputs $y_1,\ldots,y_k$, a ciphertext $c_\textit{f}=\mathsf{Enc}(f(x_1,\ldots,x_n,y_1,\ldots, y_k))$ for any polynomial $f$ in which each monomial has total degree at most $1$ in the $x$-variables (but can be arbitrary in the $y$-variables). For AHE that satisfies a set of natural requirements, we give a non-interactive zero-knowledge proof system (in the random-oracle model) for showing that a ciphertext $c_\textit{f}$ is the result of homomorphically evaluating $f$ on ciphertexts $c_1,\ldots,c_n$ and private inputs $y_1,\ldots,y_k$ that correspond to commitments $C_1,\ldots,C_k$. Our proofs are $\textit{succinct}$, i.e., their size is independent of the number of ciphertexts $n$, and is instead $O(k\log d)$ where $k$ is the number of private inputs, and $d$ is the maximum degree of any variable in $f$.
We give two ways of instantiating this framework: with ElGamal-based encryption (under the DDH assumption) and with a variant of the Camenisch-Shoup cryptosystem (under the DCR assumption). Both yield proof systems where computing and verifying the proof takes a comparable amount of time to homomorphically evaluating $f$.
Next, we show that our framework yields a dramatically improved privacy-preserving blueprint (PPB) system. Introduced by Kohlweiss, Lysyanskaya, and Nguyen (Eurocrypt'23), an $f$-PPB system allows an auditor with secret input $x$ to create a public encoding $\sf pk$ of the function $f(x,\cdot)$ that reveals nothing about $x$.
Yet, it allows a user to compute an encoding, or escrow $Z$, of the value $f(x,y)$ on input the user's private data $y$ corresponding to a commitment $C_y$; $Z$ will verifiably correspond to the commitment $C_y$. The auditor will be able to recover $f(x,y)$ from $Z$, but will learn no other information about $y$. For example, if $f$ is the watchlist function where $f(x,y)$ outputs $y$ only in the event that $y$ is on the list $x$, then an $f$-PPB allows the auditor to trace watchlisted users in an otherwise anonymous system.
Using our succinct zero-knowledge proof system for additively homomorphic computation we achieve the following results: (1) We provide efficient schemes for a bigger class of functions $f$; for example, we show how to realize $f$ that would allow the auditor to trace e-cash transactions of a criminal suspect which was previously not efficient. (2) For the watchlist and related functions, we reduce the size of the escrow $Z$ from linear in the size of the auditor's input $x$, to logarithmic.
Additionally, we define and satisfy a stronger notion of security for $f$-PPBs, where a malicious auditor cannot frame a user in a transaction in which the user was not involved in.
Efficient Modular Multiplication Hardware for Number Theoretic Transform on FPGA
In this paper, we present a comprehensive analysis of various modular multiplication methods for Number Theoretic Transform (NTT) on FPGA. NTT is a critical and time-intensive component of Fully Homomorphic Encryption (FHE) applications while modular multiplication consumes a significant portion of the design resources in an NTT implementation. We study the existing modular reduction approaches from the literature, and implement particular methods on FPGA. Specifically Word-Level Montgomery (WLM)) for NTT friendly primes [1] and K2RED [2]. For improvements, we explore the trade-offs between the number of available primes in special forms and hardware cost of the reduction methods. We develop a DSP multiplication-optimized version of WLM, which we call WLM-Mixed. We also introduce a subclass of Proth primes, referred to as Proth-l primes, characterized by a low and fixed signed Hamming Weight. This special class of primes allows us to design multiplication-free shift-add versions of K2RED and naive Montgomery reduction [3], referred to as K2RED-Shift and Montgomery-Shift. We provide in-depth evaluations of these five reduction methods in an NTT architecture on FPGA. Our results indicate that WLM-Mixed is highly resource-efficient, utilizing only 3 DSP multiplications for 64-bit coefficient moduli. On the other hand, K2RED-Shift and Montgomery-Shift offer DSP-free alternatives, which can be beneficial in specific scenarios
IO-Optimized Design-Time Configurable Negacyclic Seven-Step NTT Architecture for FHE Applications
Uncategorized
Uncategorized
FHE enables computations on encrypted data, making it essential for privacy-preserving applications. However, it involves computationally demanding tasks, such as polynomial multiplication, while NTT is the state-of-the-art solution to perform this task. Most FHE schemes operate over the negacyclic ring of polynomials. We introduce a novel formulation of the hierarchical Four-Step NTT approach for the negacyclic ring, eliminating the need for pre- and post-processing steps found in the existing methods. To accelerate NTT operations, the FPGAs offer flexible and powerful computing platforms. We propose an FPGA-based, parametric and fully pipelined architecture that implements the improved Seven-Step NTT algorithm (which builds upon the four-step). Our design supports a wide range of parameters, including ring sizes up to $2^{16}$ and modulus sizes up to $64$-bit. We focus on achieving configurable throughput, as constrained by the bandwidth of HBM bandwidth, and aim to maximize throughput through an IO parametric design on the Alveo U280 FPGA. The implementation results demonstrate a reduction in the area-time-product by $2.08\times$ and a speed-up of $10.32\times$ for a ring size of $2^{16}$ and a 32-bit width compared to the current state-of-the-art designs.
Chosen-Prefix Collisions on AES-like Hashing
Chosen-prefix collision (CPC) attack was first presented by Stevens, Lenstra and de Weger on MD5 at Eurocrypt 2007. A CPC attack finds a collision for any two chosen prefixes, which is a stronger variant of collision attack. CPCs are naturally harder to construct but have larger practical impact than (identical-prefix) collisions, as seen from the series of previous works on MD5 by Stevens et al. and SHA-1 by Leurent and Peyrin. Despite its significance, the resistance of CPC attacks has not been studied on AES-like hashing.
In this work, we explore CPC attacks on AES-like hashing following the framework practiced on MD5 and SHA-1. Instead of the message modification technique developed for MD-SHA family, we opt for related-key rebound attack to construct collisions for AES-like hashing in view of its effectiveness. We also note that the CPC attack framework can be exploited to convert a specific class of one-block free-start collisions into two-block collisions, which sheds light on the importance of free-start collisions. As a result, we present the first CPC attacks on reduced Whirlpool, Saturnin-hash and AES-MMO/MP in classic and quantum settings, and extend the collision attack on Saturnin-hash from 5 to 6 rounds in the classic setting. As an independent contribution, we improve the memoryless algorithm of solving 3-round inbound phase by Hosoyamada and Sasaki at Eurocrpyt 2020, which leads to improved quantum attacks on Whirlpool. Notably, we find the first 6-round memoryless quantum collision attack on Whirlpool better than generic CNS collision finding algorithm when exponential-size qRAM is not available but exponential-size classic memory is available.
Authenticated private information retrieval
This paper introduces protocols for authenticated private information retrieval. These schemes enable a client to fetch a record from a remote database server such that (a) the server does not learn which record the client reads, and (b) the client either obtains the "authentic" record or detects server misbehavior and safely aborts. Both properties are crucial for many applications. Standard private-information-retrieval schemes either do not ensure this form of output authenticity, or they require multiple database replicas with an honest majority. In contrast, we offer multi-server schemes that protect security as long as at least one server is honest. Moreover, if the client can obtain a short digest of the database out of band, then our schemes require only a single server. Performing an authenticated private PGP-public-key lookup on an OpenPGP key server's database of 3.5 million keys (3 GiB), using two non-colluding servers, takes under 1.2 core-seconds of computation, essentially matching the time taken by unauthenticated private information retrieval. Our authenticated single-server schemes are 30-100$\times$ more costly than state-of-the-art unauthenticated single-server schemes, though they achieve incomparably stronger integrity properties.
Differential MITM attacks on SLIM and LBCIoT
SLIM and LBCIoT are lightweight block ciphers proposed for IoT applications. We present differential meet-in-the-middle attacks on these ciphers and discuss several implementation variants and possible improvements of these attacks. Experimental validation also shows some results that may be of independent interest in the cryptanalysis of other ciphers. Namely, the problems with low-probability differentials and the questionable accuracy of standard complexity estimates of using filters.
Partially Non-Interactive Two-Round Lattice-Based Threshold Signatures
This paper gives the first lattice-based two-round threshold signature based on lattice assumptions for which the first message is independent of the message being signed without relying on fully-homomorphic encryption, and our construction supports arbitrary thresholds.
Our construction provides a careful instantiation of a generic threshold signature construction by Tessaro and Zhu (EUROCRYPT ’23) based on specific linear hash functions, which in turns can be seen as a generalization of the FROST scheme by Komlo and Goldberg (SAC ’20). Our reduction techniques are new in the context of lattice-based cryptography. Also, our scheme does not use any heavy tools, such as NIZKs or homomorphic trapdoor commitments.
Verifying Jolt zkVM Lookup Semantics
Lookups are a popular way to express repeated constraints in state-of-the art SNARKs. This is especially the case for zero-knowledge virtual machines (zkVMs), which produce succinct proofs of correct execution for programs expressed as bytecode according to a specific instruction set architecture (ISA). The Jolt zkVM (Arun, Setty & Thaler, Eurocrypt 2024) for RISC-V ISA employs Lasso (Setty, Thaler & Wahby, Eurocrypt 2024), an efficient lookup argument for massive structured tables, to prove correct execution of instructions. Internally, Lasso performs multiple lookups into smaller subtables, then combines the results.
We present an approach to formally verify Lasso-style lookup arguments against the semantics of instruction set architectures. We demonstrate our approach by formalizing and verifying all Jolt 32-bit instructions corresponding to the RISC-V base instruction set (RV32I) using the ACL2 theorem proving system. Our formal ACL2 model has undergone extensive validation against the Rust implementation of Jolt. Due to ACL2's bit-blasting, rewriting, and developer-friendly features, our formalization is highly automated.
Through formalization, we also discovered optimizations to the Jolt codebase, leading to improved efficiency without impacting correctness or soundness. In particular, we removed one unnecessary lookup each for four instructions, and reduced the sizes of three subtables by 87.5\%.
Doubly Efficient Batched Private Information Retrieval
Private information retrieval (PIR) allows a client to read data from a server, without revealing which information they are interested in. A PIR is doubly efficient if the server runtime is, after a one-time pre-processing, sublinear in the database size. A recent breakthrough result from Lin, Mook, and Wichs [STOC’23] proposed the first-doubly efficient PIR with (online) server computation poly-logarithmic in the size of the database, assuming the hardness of the standard Ring-LWE problem.
In this work, we consider the problem of doubly efficient batched PIR (DEBPIR), where the client wishes to download multiple entries. This problem arises naturally in many practical applications of PIR, or when the database contains large entries. Our main result is a construction of DEBPIR where the amortized communication and server computation overhead is $\tilde{O}(1)$, from the Ring-LWE problem. This represents an exponential improvement compared with known constructions, and it is optimal up to poly-logarithmic factors in the security parameter. Interestingly, the server’s online operations are entirely combinatorial and all algebraic computations are done in the pre-processing or delegated to the client.
Do Not Disturb a Sleeping Falcon: Floating-Point Error Sensitivity of the Falcon Sampler and Its Consequences
Falcon is one of the three postquantum signature schemes already selected by NIST for standardization. It is the most compact among them, and offers excellent efficiency and security. However, it is based on a complex algorithm for lattice discrete Gaussian sampling which presents a number of implementation challenges. In particular, it relies on (possibly emulated) floating-point arithmetic, which is often regarded as a cause for concern, and has been leveraged in, e.g., side-channel analysis. The extent to which Falcon's use of floating point arithmetic can cause security issues has yet to be thoroughly explored in the literature.
In this paper, we contribute to filling this gap by identifying a way in which Falcon's lattice discrete Gaussian sampler, due to specific design choices, is singularly sensitive to floating-point errors. In the presence of small floating-point discrepancies (which can occur in various ways, including the use of the two almost but not quite equivalent signing procedures ``dynamic'' and ``tree'' exposed by the Falcon API), we find that, when called twice on the same input, the Falcon sampler has a small but significant chance (on the order of once in a few thousand calls) of outputting two different lattice points with a very structured difference, that immediately reveals the secret key. This is in contrast to other lattice Gaussian sampling algorithms like Peikert's sampler and Prest's hybrid sampler, that are stable with respect to small floating-point errors.
Correctly generated Falcon signatures include a salt that should in principle prevent the sampler to ever be called on the same input twice. In that sense, our observation has little impact on the security of Falcon signatures per se (beyond echoing warnings about the dangers of repeated randomness). On the other hand, it is critical for derandomized variants of Falcon, which have been proposed for use in numerous settings. One can mention in particular identity-based encryption, SNARK-friendly signatures, and sublinear signature aggregation. For all these settings, small floating point discrepancies have a chance of resulting in full private key exposure, even when using the slower, integer-based emulated floating-point arithmetic of Falcon's
reference implementation.
Impossibility Results for Post-Compromise Security in Real-World Communication Systems
Modern secure communication systems, such as iMessage, WhatsApp, and Signal include intricate mechanisms that aim to achieve very strong security properties. These mechanisms typically involve continuously merging in new fresh secrets into the keying material, which is used to encrypt messages during communications. In the literature, these mechanisms have been proven to achieve forms of Post Compromise Security (PCS): the ability to provide communication security even if the full state of a party was compromised some time in the past. However, recent work has shown these proofs do not transfer to the end-user level, possibly because of usability concerns. This has raised the question of whether end-users can actually obtain PCS or not, and under which conditions.
Here we show and formally prove that communication systems that need to be resilient against certain types of state loss (which can occur in practice) fundamentally cannot achieve full PCS for end-users. Whereas previous work showed that the Signal messenger did not achieve this with its current session-management layer, we isolate the exact conditions that cause this failure, and why this cannot be simply solved in communication systems by implementing a different session-management layer or an entirely different protocol. Moreover, we clarify the trade-off of the maximum number of sessions between two users (40 in Signal) in terms of failure-resilience versus security.
Our results have direct consequences for the design of future secure communication systems, and could motivate either the simplification of redundant mechanisms, or the improvement of session-management designs to provide better security trade-offs with respect to state loss/failure tolerance.
Improved PIR Schemes using Matching Vectors and Derivatives
In this paper, we construct new t-server Private Information Retrieval (PIR) schemes with communication complexity subpolynomial in the previously best known, for all but finitely many t. Our results are
based on combining derivatives (in the spirit of Woodruff-Yekhanin) with the Matching Vector based PIRs of Yekhanin and Efremenko. Previously such a combination was achieved in an ingenious way by Dvir and Gopi, using polynomials and derivatives over certain exotic rings, en route to their fundamental result giving the first 2-server PIR with subpolynomial communication.
Our improved PIRs are based on two ingredients:
• We develop a new and direct approach to combine derivatives with Matching Vector based PIRs.
This approach is much simpler than that of Dvir-Gopi: it works over the same field as the original PIRs, and only uses elementary properties of polynomials and derivatives.
• A key subproblem that arises in the above approach is a higher-order polynomial interpolation problem. We show how “sparse S-decoding polynomials”, a powerful tool from the original constructions of Matching Vector PIRs, can be used to solve this higher-order polynomial interpolation problem using surprisingly few higer-order evaluations.
Using the known sparse S-decoding polynomials in combination with our
ideas leads to our improved PIRs. Notably, we get a 3-server PIR scheme with communication $2^{O^\sim( (\log n)^{1/3}) }$, improving upon the previously best known communication of $2^{O^\sim( \sqrt{\log n})}$ due to Efremenko.
Ceno: Non-uniform, Segment and Parallel Zero-knowledge Virtual Machine
In this paper, we explore a novel Zero-knowledge Virtual Machine (zkVM) framework leveraging succinct, non-interactive zero-knowledge proofs for verifiable computation over any code. Our approach divides the proof of program execution into two stages. In the first stage, the process breaks down program execution into segments, identifying and grouping identical sections. These segments are then proved through data-parallel circuits that allow for varying amounts of duplication. In the subsequent stage, the verifier examines these segment proofs, reconstructing the program's control and data flow based on the segments' duplication number and the original program. The second stage can be further attested by a uniform recursive proof.
We propose two specific designs of this concept, where segmentation and parallelization occur at two levels: opcode and basic block. Both designs try to minimize the control flow that affects the circuit size and support dynamic copy numbers, ensuring that computational costs directly correlate with the actual code executed (i.e., you only pay as much as you use). In our second design, in particular, by proposing an innovative data-flow reconstruction technique in the second stage, we can drastically cut down on the stack operations even compared to the original program execution. Note that the two designs are complementary rather than mutually exclusive. Integrating both approaches in the same zkVM could unlock more significant potential to accommodate various program patterns.
We present an asymmetric GKR scheme to implement our designs, pairing a non-uniform prover and a uniform verifier to generate proofs for dynamic-length data-parallel circuits. The use of a GKR prover also significantly reduces the size of the commitment. GKR allows us to commit only the circuit's input and output, whereas in Plonkish-based solutions, the prover needs to commit to all the witnesses.
Age-aware Fairness in Blockchain Transaction Ordering for Reducing Tail Latency
In blockchain networks, transaction latency is crucial for determining the quality of service (QoS). The latency of a transaction is measured as the time between its issuance and its inclusion in a block in the chain. A block proposer often prioritizes transactions with higher fees or transactions from accounts it is associated with, to minimize their latencies. To maintain fairness among transactions, a block proposer is expected to select the included transactions randomly. The random selection might cause some transactions to experience high latency following the variance in the time a transaction waits until it is selected. We suggest an alternative, age-aware approach towards fairness so that transaction priority is increased upon observing a large waiting time. We explain that a challenge with this approach is that the age of a transaction is not absolute due to transaction propagation. Moreover, a node might present its transactions as older to obtain priority. We describe a new technique to enforce a fair block selection while prioritizing transactions that observed high latency. The technique is based on various declaration schemes in which a node declares its pending transactions, providing the ability to validate transaction age. By evaluating the solutions on Ethereum data and synthetic data of various scenarios, we demonstrate the advantages of the approach under realistic conditions and understand its potential impact to maintain fairness and reduce tail latency.
Push-Button Verification for BitVM Implementations
Bitcoin, while being the most prominent blockchain with the largest market capitalization, suffers from scalability and throughput limitations that impede the development of ecosystem projects like Bitcoin Decentralized Finance (BTCFi). Recent advancements in BitVM propose a promising Layer 2 (L2) solution to enhance Bitcoin's scalability by enabling complex computations off-chain with on-chain verification. However, Bitcoin's constrained programming environment—characterized by its non-Turing-complete Script language lacking loops and recursion, and strict block size limits—makes developing complex applications labor-intensive, error-prone, and necessitates manual partitioning of scripts. Under this complex programming model, subtle mistakes could lead to irreversible damage in a trustless environment like Bitcoin. Ensuring the correctness and security of such programs becomes paramount.
To address these challenges, we introduce the first formal verifier for BitVM implementations. Our approach involves designing a register-based, higher-level domain-specific language (DSL) that abstracts away complex stack operations, allowing developers to reason about program correctness more effectively while preserving the semantics of the low-level program. We present a formal computational model capturing the semantics of BitVM execution and Bitcoin script, providing a foundation for rigorous verification. To efficiently handle large programs and complex constraints arising from unrolled computations that simulate loops, we summarize repetitive "loop-style" computations using loop invariant predicates in our DSL. We leverage a counterexample-guided inductive synthesis (CEGIS) procedure to lift low-level Bitcoin script into our DSL, facilitating efficient verification without sacrificing accuracy. Evaluated on 78 benchmarks from BitVM implementations, our tool successfully verifies 83% of cases within 12.55 seconds on average and identified one previously unknown vulnerability, demonstrating its effectiveness in enhancing the security and reliability of BitVM.
Fine-Grained Non-Interactive Key-Exchange without Idealized Assumptions, and Lower Bounds
In this paper, we study multi-party non-interactive key exchange (NIKE) in the fine-grained setting. More precisely, we propose three multi-party NIKE schemes in three computation models, namely, the bounded parallel-time, bounded time, and bounded storage models. Their security is based on a very mild assumption (e.g., NC1 ⊊ ⊕L/poly) or even without any complexity assumption. This improves the recent work of Afshar, Couteau, Mahmoody, and Sadeghi (EUROCRYPT 2023) that requires idealized assumptions, such as random oracles or generic groups.
Additionally, we show that all our constructions satisfy a natural desirable property that we refer to as extendability, and we give generic transformations from extendable multi-party NIKE to multi-party identity-based NIKEs in the fine-grained settings.
Furthermore, we generalize the lower bound on users’ storage consumption in the bounded storage model by Dziembowski and Maurer (Eurocrypt 2004) to encompass any multi-party NIKE with extendability. This new lower bound suggests that the users’ storage consumption of our multi-party NIKE in the bounded storage model is optimal.
A Fault Analysis on SNOVA
SNOVA is a post-quantum cryptographic signature scheme known for its efficiency and compact key sizes, making it a second-round candidate in the NIST post-quantum cryptography standardization process. This paper presents a comprehensive fault analysis of SNOVA, focusing on both permanent and transient faults during signature generation. We introduce several fault injection strategies that exploit SNOVA's structure to recover partial or complete secret keys with limited faulty signatures. Our analysis reveals that as few as $22$ to $68$ faulty signatures, depending on the security level, can suffice for key recovery. We propose a novel fault-assisted reconciliation attack, demonstrating its effectiveness in extracting the secret key space via solving a quadratic polynomial system. Simulations show transient faults in key signature generation steps can significantly compromise SNOVA’s security. To address these vulnerabilities, we propose a lightweight countermeasure to reduce the success of fault attacks without adding significant overhead. Our results highlight the importance of fault-resistant mechanisms in post-quantum cryptographic schemes like SNOVA to ensure robustness.
Single Trace Side-Channel Attack on the MPC-in-the-Head Framework
In this paper, we present the first single trace side-channel attack that targets the MPC-in-the-Head (MPCitH) framework based on threshold secret sharing, also known as Threshold Computation in the Head (TCitH) in its original version. This MPCitH framework can be found in 5 of the 14 digital signatures schemes in the recent second round of the National Institute of Standards and Technology (NIST) call for digital signatures. In this work, we start by highlighting a side-channel vulnerability of the TCitH framework and show an exploitation of it on the SDitH algorithm, which is part of this NIST call. Specifically, we exploit the leakage of a multiplication function in the Galois field to make predictions about intermediate values, and we use the structure of the algorithm to combine information efficiently. This allows us to build an attack that is both the first Soft Analytical Side-Channel Attack (SASCA) targeting the MPCitH framework, as well as the first attack on SDitH. More specifically, we build a SASCA based on Belief Propagation (BP) on the evaluation of polynomials in the signature using the threshold variant structure to reconstruct the secret key. We perform simulated attacks under the Hamming Weight (HW) leakage model, enabling us to evaluate the resistance of the scheme against SASCA. We then perform our attacks in a real case scenario, more specifically on the STM32F407, and recover the secret key for all the security levels. We end this paper by discussing the various shuffling countermeasures we could use to mitigate our attacks.
Efficient Verifiable Differential Privacy with Input Authenticity in the Local and Shuffle Model
Local differential privacy (LDP) enables the efficient release of aggregate statistics without having to trust the central server (aggregator), as in the central model of differential privacy, and simultaneously protects a client's sensitive data. The shuffle model with LDP provides an additional layer of privacy, by disconnecting the link between clients and the aggregator. However, LDP has been shown to be vulnerable to malicious clients who can perform both input and output manipulation attacks, i.e., before and after applying the LDP mechanism, to skew the aggregator's results. In this work, we show how to prevent malicious clients from compromising LDP schemes. Our only realistic assumption is that the initial raw input is authenticated; the rest of the processing pipeline, e.g., formatting the input and applying the LDP mechanism, may be under adversarial control. We give several real-world examples where this assumption is justified. Our proposed schemes for verifiable LDP (VLDP), prevent both input and output manipulation attacks against generic LDP mechanisms, requiring only one-time interaction between client and server, unlike existing alternatives [37, 43]. Most importantly, we are the first to provide an efficient scheme for VLDP in the shuffle model. We describe, and prove security of, two schemes for VLDP in the local model, and one in the shuffle model. We show that all schemes are highly practical, with client run times of less than 2 seconds, and server run times of 5-7 milliseconds per client.
Compactly Committing Authenticated Encryption Using Encryptment and Tweakable Block Cipher
Facebook introduced message franking to enable users to report abusive content verifiably in end-to-end encrypted messaging. Grubbs et al. formalized the underlying primitive called compactly committing authenticated encryption with associated data (ccAEAD) and presented schemes with provable security. Dodis et al. proposed a core building block called encryptment and presented a generic construction of ccAEAD with encryptment and standard AEAD. This paper first proposes to use a tweakable block cipher instead of AEAD for the generic construction of Dodis et al. In the security analysis of the proposed construction, its ciphertext integrity is shown to require a new but feasible assumption on the ciphertext integrity of encryptment. Then, this paper formalizes remotely keyed ccAEAD (RK ccAEAD) and shows that the proposed construction works as RK ccAEAD. Finally, the confidentiality of the proposed construction as RK ccAEAD is shown to require a new variant of confidentiality for encryptment. The problem of remotely keyed encryption was posed by Blaze in 1996. It is now related to the problem of designing a cryptographic scheme using a trusted module and/or with leakage resiliency.
A Formal Treatment of Envelope Encryption
Envelope encryption is a method to encrypt data with two distinct keys in its basic form. Data is first encrypted with a data-encryption key, and then the data-encryption key is encrypted with a key-encryption key. Despite its deployment in major cloud services, as far as we know, envelope encryption has not received any formal treatment. To address this issue, we first formalize the syntax and security requirements of envelope encryption in the symmetric-key setting. Then, we show that it can be constructed by combining encryptment and authenticated encryption with associated data (AEAD). Encryptment is one-time AEAD satisfying that a small part of a ciphertext works as a commitment to the corresponding secret key, message, and associated data. Finally, we show that the security of the generic construction is reduced to the security of the underlying encryptment and AEAD.
THOR: Secure Transformer Inference with Homomorphic Encryption
As language models are increasingly deployed in cloud environments, privacy concerns have become a significant issue. To address this, we design THOR, a secure inference framework for transformer models on encrypted data. Specifically, we first propose new fast matrix multiplication algorithms based on diagonal-major order encoding and extend them to parallel matrix computation through the compact ciphertext packing technique. Second, we design efficient protocols for secure computations of four non-linear functions such as softmax, LayerNorm, GELU, and Tanh, by integrating advanced underlying approximation methods with tailored optimizations. Our matrix multiplication algorithms reduce the number of key-switching operations in the linear layers of the attention block in the BERT-base model by up to 14.5x, compared to the state-of-the-art HE-based secure inference protocol (Park et al., Preprint). Combined with cryptographic optimizations, our experimental results demonstrate that THOR provides secure inference for the BERT-base model with a latency of 10.43 minutes on a single GPU, while maintaining comparable inference accuracy on the MRPC dataset.
Cryptography Experiments In Lean 4: SHA-3 Implementation
In this paper we explain how we implemented the Secure Hash Algorithm-3 (SHA-3) family of functions in Lean 4, a functional programming language and theorem prover. We describe how we used several Lean facilities including type classes, dependent types, macros, and formal verification, and then refined the design to provide a simple one-shot and streaming API for hashing, and Extendable-output functions (XOFs), to reduce potential for misuse by users, and formally prove properties about the implementation.
A Closer Look at the Belief Propagation Algorithm in Side-Channel-Assisted Chosen-Ciphertext Attacks
The implementation security of post-quantum cryptography (PQC) algorithms has emerged as a critical concern with the PQC standardization process reaching its end. In a side-channel-assisted chosen-ciphertext attack, the attacker builds linear inequalities on secret key components and uses the belief propagation (BP) algorithm to solve. The number of inequalities leverages the query complexity of the attack, so the fewer the better. In this paper, we use the PQC standard algorithm CRYSTALS-Kyber as a study case to construct bilateral inequalities on key variables with substantially narrower intervals using a side-channel-assisted oracle. For Kyber512, Kyber768, and Kyber 1024, the average Shannon entropy carried by such inequality is improved from the previous 0.6094, 0.4734, and 0.8544 to 0.6418, 0.4777, and 1.2007. The number of such inequalities required to recover the key utilizing the BP algorithm for Kyber512 and Kyber1024 is reduced by 5.32% and 40.53% in theory and experimentally the reduction is even better. The query complexity is reduced by 43%, 37%, and 48% for Kyber512, 768, and 1024 assuming reasonably perfect reliability.
Furthermore, we introduce a strategy aimed at further refining the interval of inequalities. Diving into the BP algorithm, we discover a measure metric named JSD (Jensen-Shannon distance)-metric that can gauge the tightness of an inequality. We then develop a machine learning-based strategy to utilize the JSD-metrics to contract boundaries of inequalities even with fewer inequalities given, thus improving the entropy carried by the system of linear inequalities. This contraction strategy is at the algorithmic level and has the potential to be employed in all attacks endeavoring to establish a system of inequalities concerning key variables.
Free-XOR Gate Bootstrapping
The FHEW-like gate bootstrapping framework operates in a 2-bit plaintext space, where logic gates such as NAND, XOR, and AND are implemented by adding two ciphertexts and extracting the most significant bit. However, each gate operation requires bootstrapping with a primary cost of one blind rotation, which is expensive, when processing circuit operations for applications. We propose a novel Free-XOR gate bootstrapping framework based on a single-bit plaintext space, in which the XOR operation is realized by simply adding two ciphertexts, resulting in an almost free computational cost. To form a minimal complete set for logical operations, we design an algorithm for the AND gate within this framework. The AND gate cost of our Free-XOR gate bootstrapping involves two blind rotations. However, by utilizing a single-bit plaintext space to enhance noise tolerance and swapping some operations of the bootstrapping process, we can adopt a more compact parameter setting, which in turn accelerates the speed of blind rotation. We propose an instantiation of the NTRU-based AND gate operation, which requires two blind rotations. Despite the additional rotation, the overall computational cost is marginally lower than the state-of-the-art gate bootstrapping scheme LLW+ [TCHES24], which utilizes only a single blind rotation. In addition, our approach achieves a significant reduction in key size, reducing it to 3.3 times the size of LLW+ [TCHES24].
How to Redact the Bitcoin Backbone Protocol
We explain how to extend the Bitcoin backbone model of Garay et al. (Eurocrypt, 2015) to accommodate for redactable blockchains. Our extension captures fluid blockchain-based databases (with mutability requirements) and compliance with existing legislation, such as the GDPR right to be forgotten, or the need to erase offending data from nodes’ databases that would otherwise provoke legal shutdowns. Our redactable backbone protocol retains the essential properties of blockchains. Leveraging zero-knowledge proofs, old data can be erased without requiring trusted third parties or heuristics about past chain validation. Our solution can be implemented on Bitcoin immediately without hard-forks, and it is scalable. It allows the redaction of data from UTXOs or unconfirmed transactions that have not yet flooded the network, while guaranteeing invariance of the Bitcoin state. Thus, offending data does not need to persist in the system, not even temporarily.
A Practical Protocol for Quantum Oblivious Transfer from One-Way Functions
We present a new simulation-secure quantum oblivious transfer (QOT) protocol based on one-way functions in the plain model. With a focus on practical implementation, our protocol surpasses prior works in efficiency, promising feasible experimental realization. We address potential experimental errors and their correction, offering analytical expressions to facilitate the analysis of the required quantum resources. Technically, we achieve simulation security for QOT through an equivocal and relaxed-extractable quantum bit commitment.
Symmetric Twin Column Parity Mixers and their Applications
The circulant twin column parity mixer (TCPM) is a type of mixing layer for the round function of cryptographic permutations designed by Hirch et al. at CRYPTO 2023. It has a bitwise differential branch number of 12 and a bitwise linear branch number of 4, which makes it competitive in applications where differential security is required. Hirch et al. gave a concrete instantiation of a permutation using such a mixing layer, named Gaston, and showed the best 3-round differential and linear trails of Gaston have much higher weights than those of ASCON. In this paper, we first prove why the TCPM has linear branch number 4 and then show that Gaston's linear behavior is worse than ASCON for more than 3 rounds. Motivated by these facts, we aim to enhance the linear security of the TCPM. We show that adding a specific set of row cyclic shifts to the TCPM can make its differential and linear branch numbers both 12. Notably, by setting a special relationship between the row shift parameters of the modified TCPM, we obtain a special kind of mixlayer called the symmetric circulant twin column parity mixer. The symmetric TCPM has a unique design property that its differential and linear branch histograms are the same, which makes the parameter selection process and the security analysis convenient. Using the symmetric TCPM, we present two new 320-bit cryptographic permutations, namely (1) Gaston-S where we replace the mixing layer in Gaston with the symmetric TCPM and (2) SBD which uses a low-latency degree-4 S-box as the non-linear layer and the symmetric TCPM as the mixing layer. We evaluate the security of these permutations considering differential, linear and algebraic analysis, and then provide the performance comparison with Gaston in both hardware and software. Our results indicate that Gaston-S and SBD are competitive with Gaston in both security and performance.
Practical Zero-Knowledge PIOP for Public Key and Ciphertext Generation in (Multi-Group) Homomorphic Encryption
Homomorphic encryption (HE) is a foundational technology in privacy-enhancing cryptography, enabling non-interactive computation over encrypted data. Recently, generalized HE primitives designed for multi-party applications, such as multi-group HE (MGHE), have gained significant research interest.
While constructing secure multi-party protocols from (MG)HE in the semi-honest model is straightforward, zero-knowledge techniques are essential for ensuring security against malicious adversaries.
In this work, we design practical proof systems for MGHE to guarantee the well-formedness of public keys and ciphertexts. Specifically, we develop and optimize a polynomial interactive oracle proof (PIOP) for MGHE, which can be compiled into zk-SNARKs using a polynomial commitment scheme (PCS).
We compile our PIOP using a lattice-based PCS, and our implementation achieves a 5.5x reduction in proof size, a 70x speed-up in proof generation, and a 343x improvement in verification time compared to the previous state-of-the-art construction, PELTA (ACM CCS 2023). Additionally, our PIOPs are modular, enabling the use of alternative PCSs to optimize other aspects, such as further reducing proof sizes.
On the vector subspaces of $\mathbb{F}_{2^n}$ over which the multiplicative inverse function sums to zero
We study the behavior of the multiplicative inverse function (which plays an important role in cryptography and in the study of finite fields), with respect to a recently introduced generalization of almost perfect nonlinearity (APNness), called $k$th-order sum-freedom, that extends a classic characterization of APN functions, and has also some relationship with integral attacks. This generalization corresponds to the fact that a vectorial function $F:\mathbb F_2^n\mapsto \mathbb F_2^m$ sums to a nonzero value over every $k$-dimensional affine subspace of $\mathbb F_2^n$, for some $k\leq n$ (APNness corresponds to $k=2$). The sum of the values of the inverse function $x\in \mathbb F_{2^n}\mapsto x^{2^n-2}\in \mathbb F_{2^n}$ over any affine subspace $A$ of $\mathbb{F}_{2^n}$ not containing 0 ({\em i.e.} being not a vector space) has been addressed, thanks to a simple expression of such sum, which shows that it never vanishes. We study in the present paper the case of vector (i.e. linear) subspaces, which is much less simple to handle. The sum depends on a coefficient in subspace polynomials.
We study for which values of $k$ the multiplicative inverse function can sum to nonzero values over all $k$-dimensional vector subspaces. We show that, for every $k$ not co-prime with $n$, it sums to zero over at least one $k$-dimensional $\mathbb{F}_2$-subspace of $\mathbb{F}_{2^n}$. We study the behavior of the inverse function over direct sums of vector spaces and we deduce that the property of the inverse function to be $k$th-order sum-free happens for $k$ if and only if it happens for $n-k$. We derive several other results and we show that the set of values $k$ such that the inverse function is not $k$th-order sum-free is stable when adding two values of $k$ whose product is smaller than $n$ (and when subtracting two values under some conditions). We clarify the case of dimension at most 4 (equivalently, of co-dimension at most 4) and this allows to address, for every $n$, all small enough values of $k$ of the form $3a+4b$.
Tighter Security for Group Key Agreement in the Random Oracle Model
The Messaging Layer Security (MLS) protocol, recently standardized in RFC 9420, aims to provide efficient asynchronous group key establishment with strong security guarantees. The main component of MLS, which is the source of its important efficiency and security properties, is a protocol called TreeKEM. Given that a major vision for the MLS protocol is for it to become the new standard for messaging applications like WhatsApp, Facebook Messenger, Signal, etc., it has the potential to be used by a huge number of users. Thus, it is important to better understand the security of MLS and hence also of TreeKEM. In a previous work by Klein et. al, TreeKEM was proven adaptively secure in the Random Oracle Model (ROM) with a polynomial loss in security by proving a result about the security of an arbitrary IND-CPA secure public-key encryption scheme in a public-key version of the Generalized Selective Decryption (GSD) security game.
In this work, we prove a tighter bound for the security of TreeKEM. We follow the approach in the aforementioned work and first introduce a modified version of the public-key GSD game better suited for analyzing TreeKEM. We then provide a simple and detailed proof of security for a specific encryption scheme, the DHIES scheme (currently the only standardized scheme in MLS), in this game in the ROM and achieve a tighter bound compared to the result from Klein et. al. We also define and describe the syntax and security of TreeKEM-like schemes and state a result linking the security of TreeKEM with security in our GSD game in the ROM.
On the Black-Box Complexity of Private-Key Inner-Product Functional Encryption
We initiate the study of the black-box complexity of private-key functional encryption (FE). Of central importance in the private-key setting is the inner-product functionality, which is currently only known from assumptions that imply public-key encryption, such as Decisional Diffie-Hellman or Learning-with-Errors. As our main result, we rule out black-box constructions of private-key inner-product FE from random oracles. This implies a black-box separation between private-key inner-product FE from all symmetric-key primitives implied by random oracles (e.g., symmetric-key encryption and collision-resistant hash functions).
Proving lower bounds for private-key functional encryption schemes introduces challenges that were absent in prior works. In particular, the combinatorial techniques developed by prior works for proving black-box lower bounds are only useful in the public-key setting and predicate encryption settings, which all fail for the private-key FE case. Our work develops novel combinatorial techniques based on Fourier analysis to overcome these barriers. We expect these techniques to be widely useful in future research in this area.
On the Relationship between Public Key Primitives via Indifferentiability
Recently, Masny and Rindal [MR19] formalized a notion called Endemic Oblivious Transfer (EOT), and they proposed a generic transformation from Non-Interactive Key Exchange (NIKE) to EOT with standalone security in the random oracle (RO) model. However, from the model level, the relationship between idealized NIKE and idealized EOT and the relationship between idealized elementary public key primitives have been rarely researched.
In this work, we investigate the relationship between ideal NIKE and ideal one-round EOT, as well as the relationship between ideal public key encryption (PKE) and ideal two-round Oblivious Transfer (OT), in the indifferentiability framework proposed by Maurer et al.(MRH04). Our results are threefold: Firstly, we model ideal PKE without public key validity test, ideal one-round EOT and ideal two-round OT in the indifferentiability framework. Secondly, we show that ideal NIKE and ideal one-round EOT are equivalent, and ideal PKE without public key validity test are equivalent to ideal two-round OT. Thirdly, we show a separation between ideal two-round OT and ideal one-round EOT, which implies a separation between ideal PKE and ideal NIKE.
Randomness Bounds for Private Simultaneous Messages and Conditional Disclosure of Secrets
In cryptography, the private simultaneous messages (PSM) and conditional disclosure of secrets (CDS) are closely related fundamental primitives. We consider $k$-party PSM and CDS protocols for a function $f$ with a common random string, where each party $P_i$ generates a message and sends it to a referee $P_0$. We consider bounds for the optimal length $\rho$ of the common random string among $k$ parties (or, {\it randomness complexity}) in PSM and CDS protocols with perfect and statistical privacy through combinatorial and entropic arguments. ($i$) We provide general connections from the optimal total length $\lambda$ of the messages (or, {\it communication complexity}) to the randomness complexity $\rho$. ($ii$) We also prove randomness lower bounds in PSM and CDS protocols for general functions. ($iii$) We further prove randomness lower bounds for several important explicit functions. They contain the following results: For PSM protocols with perfect privacy, we prove $\lambda-1 \le \rho$ as the general connection. From the general lower bound, we prove $\rho\ge 2^{(k-1)n}-1$ for a general function $f:(\{0,1\}^n)^k\rightarrow\{0,1\}$ under universal reconstruction, in which $P_0$ is independent of $f$. This implies that the Feige-Killian-Naor PSM protocol for a general function [Proc.~STOC '94, pp.554--563] is optimal with respect to randomness complexity. We also provide a randomness lower bound $\rho> kn-2$ for a generalized inner product function. This implies the optimality of the $2$-party PSM protocol for the inner-product function of Liu, Vaikuntanathan, and Wee [Proc.~CRYPTO 2017, pp.758--790]. For CDS protocols with perfect privacy, we show $\rho\ge\lambda-\sigma$ as the general connection by similar argument to those for PSM protocols, where $\sigma$ is the length of secrets. We also obtain randomness lower bounds $\rho\ge (k-1)\sigma$ for XOR, AND, and generalized inner product functions. These imply the optimality of Applebaum and Arkis's $k$-party CDS protocol for a general function [Proc. TCC 2018, pp.317--344] up to a constant factor in a large $k$.
Impossible Boomerang Attacks Revisited: Applications to Deoxys-BC, Joltik-BC and SKINNY
The impossible boomerang (IB) attack was first introduced by Lu in his doctoral thesis and subsequently published at DCC in 2011. The IB attack is a variant of the impossible differential (ID) attack by incorporating the idea of the boomerang attack. In this paper, we revisit the IB attack, and introduce the incompatibility of two characteristics in boomerang to the construction of an IB distinguisher. With our methodology, all the constructions of IB distinguisher are represented in a unified manner. Moreover, we show that the related-(twea)key IB distinguishers possess more freedom than the ones of ID so that it can cover more rounds.
We also propose a new tool based on Mixed-Integer Quadratically-Constrained Programming (MIQCP) to search for IB attacks. To illustrate the power of the IB attack, we mount attacks against three tweakable block ciphers: Deoxys-BC, Joltik-BC and SKINNY. For Deoxys-BC, we propose a related-tweakey IB attack on 14-round Deoxys-BC-384, which improves the best previous related-tweakey ID attack by 2 rounds, and we improve the data complexity of the best previous related-tweakey ID attack on 10-round Deoxys-BC-256. For Joltik-BC, we propose the best attacks against 10-round Joltik-BC-128 and 14-round Joltik-BC-192 with related-tweakey IB attack. For SKINNY-n-3n, we propose a 27-round related-tweakey IB attack, which improves both the time and the memory complexities of the best previous ID attack. We also propose the first related-tweakey IB attack on 28 round SKINNY-n-3n, which improves the previous best ID attack by one round.
Unbounded Leakage-Resilient Encryption and Signatures
Given the devastating security compromises caused by side-channel attacks on existing classical systems, can we store our private data encoded as a quantum state so that they can be kept private in the face of arbitrary side-channel attacks?
The unclonable nature of quantum information allows us to build various quantum protection schemes for cryptographic information such as secret keys. Examples of quantum protection notions include copy-protection, secure leasing, and finally, unbounded leakage-resilience, which was recently introduced by Çakan, Goyal, Liu-Zhang and Ribeiro (TCC'24). Çakan et al show that secrets of various cryptographic schemes (such as cryptographic keys or secret shares) can be protected by storing them as quantum states so that they satisfy LOCC (local operation and classical communication) leakage-resilience: the scheme can tolerate any unbounded amount of adaptive leakage over unbounded rounds. As a special case (dubbed $1$-round leakage), this also means that those quantum states cannot be converted to classical strings (without completely losing their functionality).
In this work, we continue the study of unbounded/LOCC leakage-resilience and consider several new primitive. In more details, we build ciphertexts, signatures and non-interactive zero-knowledge proofs with unbounded leakage-resilience. We show the following results.
- Assuming the existence of a classical $X \in \{\text{secret-key encryption}, \text{public-key encryption}\}$ scheme, we construct an $X$ scheme with LOCC leakage-resilient ciphertexts. This guarantees that an adversary who obtains LOCC-leakage on ciphertexts cannot learn anything about their contents, even if they obtain the secret key later on.
- Assuming the existence of a classical signature scheme and indistinguishability obfuscation (iO), we construct a signature scheme with LOCC leakage-resilient signatures. This guarantees that an adversary who obtains LOCC-leakage on various signatures cannot produce any valid signatures at all other than the ones it obtained honestly!
- Assuming the existence of one-way functions and indistinguishability obfuscation (iO), we construct a NIZK proof system with LOCC leakage-resilient proofs. This guarantees that an adversary who obtains LOCC-leakage on a NIZK proof of an hard instance cannot produce a valid proof!
mUOV: Masking the Unbalanced Oil and Vinegar Digital Sigital Signature Scheme at First- and Higher-Order
The National Institute for Standards and Technology (NIST) initiated a standardization procedure for additional digital signatures and recently announced round-2 candidates for the PQ additional digital signature schemes. The multivariate digital signature scheme Unbalanced Oil and Vinegar (UOV) is one of the oldest post-quantum schemes and has been selected by NIST for Round 2. Although UOV is mathematically secure, several side-channel attacks (SCA) have been shown on the UOV or UOV-based digital signatures. We carefully analyze the sensitivity of variables and operations in the UOV scheme from the side-channel perspective and show which require protection.
To mitigate implementation-based SCA, we integrate a provably secure arbitrary-order masking technique with the key generation and signature generation algorithms of UOV. We propose efficient techniques for the masked dot-product and matrix-vector operations, which are both critical in multivariate DS schemes. We also implemented and demonstrate the practical feasibility of our masking algorithms for UOV-Ip on the ARM Cortex-M4 microcontroller. Our first-order masked UOV implementations have $2.7\times$ and $3.6\times$ performance overhead compared to the unmasked scheme for key generation and signature generation algorithms. Our first-order masked UOV implementations use $1.3\times$ and $1.9\times$ stack memory rather than the unmasked version of the key generation and signature generation algorithms.
Somewhat Homomorphic Encryption from Linear Homomorphism and Sparse LPN
We construct somewhat homomorphic encryption schemes from the learning sparse parities with noise (sparse LPN) problem, along with an assumption that implies linearly homomorphic encryption (e.g., the decisional Diffie-Hellman or decisional composite residuosity assumptions). Our resulting schemes support an a-priori bounded number of homomorphic operations: $O(\log \lambda/\log \log \lambda)$ multiplications followed by poly($\lambda$) additions, where $\lambda \in \mathbb{N}$ is a security parameter. These schemes have compact ciphertexts: after homomorphic evaluation, the bit-length of each ciphertext is a fixed polynomial in the security parameter $\lambda$, independent of the number of homomorphic operations applied to it. This gives the first somewhat homomorphic encryption schemes that can evaluate the class of bounded-degree polynomials with a bounded number of monomials without relying on lattice assumptions or bilinear maps.
Much like in the Gentry-Sahai-Waters fully homomorphic encryption scheme, ciphertexts in our scheme are matrices, homomorphic addition is matrix addition, and homomorphic multiplication is matrix multiplication. Moreover, when encrypting many messages at once and performing many homomorphic evaluations at once, the bit-length of ciphertexts in some of our schemes (before and after homomorphic evaluation) can be arbitrarily close to the bit-length of the plaintexts. The main limitation of our schemes is that they require a large evaluation key, whose size scales with the complexity of the homomorphic computation performed, though this key can be re-used across any polynomial number of encryptions and evaluations.
Distributed Differential Privacy via Shuffling vs Aggregation: a Curious Study
How to achieve distributed differential privacy (DP) without a trusted central party is of great interest in both theory and practice. Recently, the shuffle model has attracted much attention. Unlike the local DP model in which the users send randomized data directly to the data collector/analyzer, in the shuffle model an intermediate untrusted shuffler is introduced to randomly permute the data, which have already been randomized by the users, before they reach the analyzer. The most appealing aspect is that while shuffling does not explicitly add more noise to the data, it can make privacy better. The privacy amplification effect in consequence means the users need to add less noise to the data than in the local DP model, but can achieve the same level of differential privacy. Thus, protocols in the shuffle model can provide better accuracy than those in the local DP model. What looks interesting to us is that the architecture of the shuffle model is similar to private aggregation, which has been studied for more than a decade. In private aggregation, locally randomized user data are aggregated by an intermediate untrusted aggregator. Thus, our question is whether aggregation also exhibits some sort of privacy amplification effect? And if so, how good is this ``aggregation model'' in comparison with the shuffle model. We conducted the first comparative study between the two, covering privacy amplification, functionalities, protocol accuracy, and practicality. The results as yet suggest that the new shuffle model does not have obvious advantages over the old aggregation model. On the contrary, protocols in the aggregation model outperform those in the shuffle model, sometimes significantly, in many aspects.
The Learning Stabilizers with Noise problem
Random classical codes have good error correcting properties, and yet they are notoriously hard to decode in practice. Despite many decades of extensive study, the fastest known algorithms still run in exponential time. The Learning Parity with Noise (LPN) problem, which can be seen as the task of decoding a random linear code in the presence of noise, has thus emerged as a prominent hardness assumption with numerous applications in both cryptography and learning theory.
Is there a natural quantum analog of the LPN problem? In this work, we introduce the Learning Stabilizers with Noise (LSN) problem, the task of decoding a random stabilizer code in the presence of local depolarizing noise. We give both polynomial-time and exponential-time quantum algorithms for solving LSN in various depolarizing noise regimes, ranging from extremely low noise, to low constant noise rates, and even higher noise rates up to a threshold. Next, we provide concrete evidence that LSN is hard. First, we show that LSN includes LPN as a special case, which suggests that it is at least as hard as its classical counterpart. Second, we prove a worst-case to average-case reduction for variants of LSN. We then ask: what is the computational complexity of solving LSN? Because the task features quantum inputs, its complexity cannot be characterized by traditional complexity classes. Instead, we show that the LSN problem lies in a recently introduced (distributional and oracle) unitary synthesis class. Finally, we identify several applications of our LSN assumption, ranging from the construction of quantum bit commitment schemes to the computational limitations of learning from quantum data.
Masking Gaussian Elimination at Arbitrary Order, with Application to Multivariate- and Code-Based PQC
Digital signature schemes based on multivariate- and code-based hard problems are promising alternatives for lattice-based signature schemes, due to their smaller signature size. Hence, several candidates in the ongoing additional standardization for quantum secure digital signature (DS) schemes by the National Institute of Standards and Technology (NIST) rely on such alternate hard problems. Gaussian Elimination (GE) is a critical component in the signing procedure of these schemes. In this paper, we provide a masking scheme for GE with back substitution to defend against first- and higher-order attacks. To the best of our knowledge, this work is the first to analyze and propose masking techniques for multivariate- or code-based DS algorithms.
We propose a masked algorithm for transforming a system of linear equations into row-echelon form. This is realized by introducing techniques for efficiently making leading (pivot) elements one while avoiding costly conversions between Boolean and multiplicative masking at all orders. We also propose a technique for efficient masked back substitution, which eventually enables a secure unmasking of the public output. We evaluate the overhead of our countermeasure for several post-quantum candidates and their different security levels at first-, second-, and third-order, including UOV, MAYO, SNOVA, QR-UOV, and MQ-Sign. Notably, the operational cost of first-, second-, and third-order masked GE is 2.3$\times$ higher, and the randomness cost is 1.2$\times$ higher in MAYO compared to UOV for security levels III and V. In contrast, these costs are similar in UOV and MAYO for one version of level I. We also show detailed performance results for masked GE implementations for all three security versions of UOV on the Arm Cortex-M4 and compare them with unmasked results. Our first-order implementations targeting UOV parameters have overheads of factor 6.5$\times$, 5.9$\times$, and 5.7$\times$ compared to the unprotected implementation for NIST security level I, III, and V.