An Efficient ZK Compiler from SIMD Circuits to General Circuits

Dung Bui¹, Haotian Chu²,³, Geoffroy Couteau¹, Xiao Wang³, Chenkai Weng⁴, Kang Yang⁵, and Yu Yu²,

¹ Université Paris Cité, CNRS, IRIF
bui@couteau@irif.fr
² Shanghai Jiao Tong University
chtvii, yyuu@sjtu.edu.cn
³ Northwestern University
{haotian.chu, wangxiao}@northwestern.edu
⁴ Arizona State University
chenkai.weng@asu.edu
⁵ State Key Laboratory of Cryptology
yangk@sklc.org

Abstract. We propose a generic compiler that can convert any zero-knowledge (ZK) proof for SIMD circuits to general circuits efficiently, and an extension that can preserve the space complexity of the proof systems. Our compiler can immediately produce new results improving upon state of the art.
– By plugging in our compiler to Antman, an interactive sublinear-communication protocol, we improve the overall communication complexity for general circuits from \( O(C^{3/4}) \) to \( O(C^{1/2}) \). Our implementation shows that for a circuit of size \( 2^{27} \), it achieves up to 83.6\( \times \) improvement on communication compared to the state-of-the-art implementation. Its end-to-end running time is at least 70\% faster in a 10Mbps network.
– Using the recent results on compressed \( \Sigma \)-protocol theory, we obtain a discrete-log-based constant-round zero-knowledge argument with \( O(C^{1/2}) \) communication and common random string length, improving over the state of the art that has linear-size common random string and requires heavier computation.
– We improve the communication of a designated \( n \)-verifier zero-knowledge proof from \( O(nC/B + n^2B^2) \) to \( O(nC/B + n^2) \).

To demonstrate the scalability of our compilers, we were able to extract a commit-and-prove SIMD ZK from Ligero and cast it in our framework. We also give one instantiation derived from LegoSNARK, demonstrating that the idea of CP-SNARK also fits in our methodology.

Keywords: Zero-Knowledge Proof, General Compiler, SIMD ZK, VOLE-based ZK, \( \Sigma \)-protocol
1 Introduction

Assume that the verification of a statement is represented as a public circuit C : \{0, 1\}^n \to \{0, 1\}. A zero-knowledge proof (ZKP) allows a prover to convince a verifier that it possesses a witness w such that C(w) = 0, without the verifier learning any information beyond the circuit output. The commit-and-prove zero-knowledge (CP-ZK) paradigm is among the most flexible and modular design mechanisms for constructing ZKP. For instance, a CP-SNARK allows a prover to commit to a batch of secrets via a commitment scheme (e.g. vector commitment or polynomial commitment), then prove relations between the committed values in ZK \[18,17,38\]. A small communication footprint is achieved when the commitment is compressing and the proof is succinct. On the other hand, schemes like VOLE-based ZKPs \[5,49,52,23\] rely on efficient interactive commitment scheme that separately commits to wire values in the circuit, then prove the consistency between committed wire values with constant overhead. Though general VOLE-ZKs incur communication complexity linear to the circuit size, they achieve high throughput owing to the lightweight operations.

Generally, CP-ZK proof systems with sublinear communication involve two components after the batch commitment of witnesses: (1) Hadamard product of committed vectors, (2) equality of individual wires across different committed vectors. The former is used to demonstrate the correct computation of multiplication gates and the latter is used to show that the committed wire values are consistent with the circuit topology.

From SIMD-ZK to general ZK. From another perspective, the above approach can be viewed as a conversion from commit-and-prove SIMD-ZK to general ZK. Define \((B, C)\)-SIMD circuit which contains \(B\) identical components of the circuit \(C\). A SIMD-ZK proves that for input witnesses \((w_1,\ldots,w_B)\), \(C(w_i) = 0\) for \(i \in [B]\). By exploiting the fact that operations are identical across \(B\) components, SIMD-ZK schemes typically utilize vector commitments and batch proofs to achieve communication sublinear in \(B \cdot |C|\). In more detail, denote by \([w]\) a commitment to a vector \(w\). Define a witness matrix \(W = ([w_1] \parallel \ldots \parallel [w_B])\). Instead of viewing the \(i\)th column as the witness to the \(i\)th evaluation of \(C\), a prover commits to each row vector and lets the verifier obtain \(([w^1],\ldots,[w^{|C|}])\). In this way, for any gate \((\alpha, \beta, \gamma, \circ)\) in \(C\) and \(\circ \in \{\text{Add}, \text{Mult}\}\), the prover only needs to prove that \(w^\gamma = w^\alpha \circ w^\beta\). ZKP schemes achieve \(O(|C|)\) proof size if both the vector commitment and batch proof of additions and multiplications incur constant size.

Most of priors work on different proof systems indeed take this approach by first implementing batch commitment and proof of multiplication gates, which are followed by a wiring consistency check \[26,20,18,17,2,50,53\]. However, they take divergent paths to tackle the latter problem. A popular approach is to compile the circuit into an algebraic format via a constraint system, e.g. rank-1 constraint system (RICS) \[27\]. Define \(z := (1, x, w)\) in which \(x\) and \(w\) are the public and private inputs of the circuit. Denote \((L, R, O)\) as the matrices that represent the map from \(z\) to the vectors of the left, right and output wires of multiplication gates \(a, b, c\). Then the relation \(a \cdot b - c = 0\) can be expressed
as $(L \cdot z) \ast (R \cdot z) - (O \cdot z) = 0$. In this way, the ZKP is reduced to proving matrix-vector products on committed values. On the other hand, some ZKPs like [50] and [53] proceed differently: they individually prove that $w^\alpha[i] = w^\beta[j]$ for any $i, j \in [B]$. Although this approach yields better scalability for the ZKP, it results in worse communication complexity, usually with a $B^2$ factor.

An interesting question is whether we can design a generic compiler that translates any commit-and-prove SIMD-ZK (CP-SIMD-ZK) into a general CP-ZK with sublinear communication. It would facilitate the design of communication-efficient ZKP because it allows the focus to be shifted to the design of SIMD-ZK primitives, which are generally easier than general-purpose ZKP.

**From SIMD-ZK to scalable ZK.** It is common for ZKPs to trade off scalability against succinctness. On the one hand, although zk-SNARKs generate proofs of constant size or size sublinear to $|C|$, their memory overhead is at least $O(|C|)$. The constant factor is large when public-key operations are involved. This prevents them from being applied to large statements; prior benchmarks only focus on statements represented by less than $2^{25}$ constraints [19]. Efforts are made to distribute the zk-SNARK proof generation among a set of provers [51,40,45,33,9], however, the overall computational and memory overhead is still prohibitive. They either need to disclose secret input to all provers, or only aim to delegate computation to more powerful workers but not to reduce the computational cost of them. Another line of work focuses on recursive SNARKs [37,35,14] that allow a statement that can be divided into multiple steps to be proven step-by-step, but they require the statement to be structured, i.e., each step is represented by identical constraints. On the other hand, interactive ZKPs such as VOLE-ZK [5,49,52,23] achieve high scalability by “streaming” the circuit evaluation. They evaluate the circuit gate-by-gate and only incur memory overhead linear in the current gates that are evaluated. Neither the witness nor the circuit structure for future gates are required to be known in advance. Hence these types of ZKPs scale to large circuits with billions of gates. However, their drawback is the $O(|C|)$ communication complexity and lack of public verifiability.

Naturally, it would be interesting to study how to achieve scalability and succinctness at the same time. Specifically, can we obtain efficient ZKPs with proof size sublinear to the circuit size, without the memory overhead being lower bounded by the circuit size?

### 1.1 Our Contributions

In this work, we start from SIMD-ZK schemes and aim to obtain efficient general ZK and scalable ZK. We first extend the SIMD-ZK functionality by adding a proof of linear map, which is easily realized by most SIMD-ZK schemes. Then we design two compilers. The first one converts a wide range of extended SIMD-ZK to general ZK, and the second one further converts it to scalable ZK for memory-constrained provers to prove large statements. For both constructions, we also demonstrate the generality of the compilers, i.e., our methods promote any SIMD-ZK to general and possibly scalable ZK so that attention can be paid
only to the design of the efficient SIMD-ZK, instead of more complicated generic primitives. Our contributions are fourfold.

**Extended SIMD-ZK.** We propose a functionality that extends the SIMD-ZK functionality $\mathcal{F}_{\text{SIMDZK}}$ and denote it as $\mathcal{F}_{\text{eSIMDZK}}$. In addition to the subroutines *commit*, *open* and *prove* that are commonly supported by SIMD-ZK schemes, it also contains a proof of linear map that checks the relation $x = My$ for committed vectors $(x, y)$. The functionality $\mathcal{F}_{\text{eSIMDZK}}$ is the fundamental building block of our constructions. Additionally, we observe that special attention needs to be paid to the security of commit-and-prove procedures when designing a general framework for scalable ZK. Some commitment schemes may put a restriction on its proving phase. The security consideration will be reflected as a counter in $\mathcal{F}_{\text{SIMDZK}}$ and analyzed when such a commitment scheme is encountered.

**Compiling SIMD-ZK to general ZK.** Based on the extended SIMD-ZK, we design a SIMD compiler that allows a wide spectrum of SIMD-ZK to work for general circuits. To do so, it first converts the general circuit into a SIMD circuit by ignoring the circuit connectivity, and proves its satisfiability via a SIMD proof. This only utilizes the *commit*, *open* and *prove* thus can be handled by the underlying SIMD-ZK. Then the compiler represents the wiring as a linear mapping of committed wire values, and proves the wiring consistency by the proof of linear map from $\mathcal{F}_{\text{eSIMDZK}}$. Our compiler is a generalization of a few works including Ligero [2,6] and LegoSNARK [18], which utilize R1CS-style representations for the wiring of circuits and reduce the statement to relations that can be better handled by the extended SIMD-ZK.

**ZKP for large statements.** Except for VOLE-based ZKP, most practical ZKPs incur large RAM consumption, often linear to the circuit size. To relax the memory overhead, we propose a framework for memory bounded provers to prove the correctness of large statements. It also relies on $\mathcal{F}_{\text{SIMDZK}}$ and can easily achieve sublinear communication complexity for arbitrary large circuits by properly instantiating the underlying SIMD-ZK. Particularly, it utilizes the proving technique in our SIMD compiler to evaluate a circuit segment by segment and prove the connectivity of wires between these segments. Similar to the current scalable interactive ZK, it does not require the whole circuit structure or the witness to be known in advance, hence allowing streaming.

**Instantiation for various proof systems.** To demonstrate the generality of our compiler, we describe and analyze the detailed instantiation of our compiler with various CP-ZK that inherently work well for SIMD circuits, including VOLE-based ZK [50], constant-round sublinear ZK from Σ-protocol [3], designated multi-verifier ZK from packed Shamir sharing [53], MPC-in-the-Head [2] and zk-SNARK from pairing [18]. We show how to adapt these work for general ZK and scalable ZK by merely satisfying the minimum requirement, that is, realizing the SIMD-ZK functionality. We emphasize that the transformation may affect the security guarantee of the underlying SIMD-ZK, and extra security analysis will be provided in that case.

In many cases, applying our compiler yields concrete efficiency improvements over the state of the art in various settings. We list our results in Section 2.2.
Furthermore, we implement the SIMD compiler and evaluate the compilation of a VOLE-based ZK [50] that is previously designed for SIMD circuits. For a circuit of size $|C| = 2^{27}$, it shows up to $83.6 \times$ improvement on communication, compared to the general VOLE-ZK Quicksilver [52]. In terms of running time, it is 70% faster when bandwidth is 10Mbps and 30% faster when bandwidth increases to 25Mbps using the same set of parameters. Its running time can be further improved if sacrificing communication by reducing batch size.

### 1.2 Related Work

Previous work on complexity-preserving zero-knowledge proofs study efficient proof generation with constrained space or time budget [10,11,24,31,8,7]. Bootle et al. propose elastic SNARKs that can either achieve linear time and space complexity, or reduce the RAM consumption to $O(\log C)$ with $O(C \log^2 C)$ computational complexity [13]. Assume an NP relation that can be verified in time $T$ and space $S$ by a RAM program, Bangalore et al. [4] propose a public-coin ZKP based on collision-resistant hash functions that allows the prover to run in time $\tilde{O}(T)$ and space $\tilde{O}(S)$, with proof size $\tilde{O}(T/S)$. Their space-preserving ZKP is converted from Ligero [2].

Recent recursive zk-SNARK and incremental verifiable computation (IVC) propose succinct arguments for composed circuits, which can be evaluated step by step [37,35,36,47,14,16]. These techniques increase the scalability of the prover, who separately generates proof for each step while simultaneously proves its consistency with all previous steps without going over the history data. They can potentially support streaming proofs in a way that the input and witness for future steps are not necessary known until those steps are reached. However, many of them only support structured circuit which are divided into a sequence of components that share the same structure. More advanced IVCs cross this barrier, however they reveal the output of each step thus does not provide the zero-knowledge guarantee when they are treated as general ZK [37,35].

### 1.3 Notation and Functionalities

**Notation.** Denote $\lambda$ as the computational security parameters and $[1, m]$ as a set $\{1, 2, \ldots, m\}$. For a vector $x \in \mathbb{F}^B$ we define its $i$-th coordinate by $x_i$, and a vector $x' := (f(0), x) \in \mathbb{F}^{B+1}$ as the concatenation of a value $f(0) \in \mathbb{F}$ and the vector $x$. Given distribution ensembles $\{X_n\}, \{Y_n\}$, we write $X_n \approx Y_n$ to denote that $X_n$ is computationally indistinguishable to $Y_n$. $\text{negl}()$ is defined as a negligible function such that $\text{negl}(\lambda) = o(\lambda^{-c})$ for any positive constant $c$. A circuit $C$ over a field $\mathbb{F}$ consists of input, output, addition and multiplication gates, where input gates use circuit-input wires as their output wires and output gates use circuit-output wires as their input wires. $|C| = C$ is the number of multiplication gates in the circuit $C$. Define $(B, C)$-SIMD circuit as a circuit that contains $B$ copies of $C$.

**Zero-knowledge proof $F_{ZK}$:**
– Upon receiving \((prove, C, w)\) from prover \(P\) and \((verify, C)\) from verifier \(V\), if \(C(w) = 0\), then output \(true\) to \(V\), else output \(false\) to \(V\).

**Vector oblivious linear evaluation** \(F_{\text{VOLE}}\). This functionality works over a field \(F\), and upon receiving \((\text{init})\) from \(P\) and \(V\), if \(V\) is honest, then sample \(\Delta \leftarrow F\), else receive \(\Delta \in F\) from the adversary. Store \(\Delta\) and ignore all subsequent \((\text{init})\) commands. Upon receiving \((\text{extend}, n)\) from \(P\) and \(V\), execute:

– If \(V\) is honest, sample \(v \leftarrow F^n\). Otherwise, receive \(v \in F^n\) from the adversary.
– If \(P\) is honest, sample \(u \leftarrow F^n\) and compute \(w := v + u \cdot \Delta \in F^n\). Otherwise, receive \(u \in F^n\) and \(w \in F^n\) from the adversary, and then recompute \(v := w - u \cdot \Delta \in F^n\).
– Output \((u, w)\) to \(P\) and \(v\) to \(V\).

**Commitment** \(F_{\text{Com}}\). Similar to the functionality of \(\text{Commit}\) command in \(F_{\text{SIMDZK}}\):

– Upon receiving input \((\text{Commit}, w)\) from \(P\) and \((\text{Commit})\) from \(V\), pick a tag \([w]\) and store \(([w], w)\) in the memory. Return \([w]\) to both parties.
– Upon receiving \((\text{Open}, [w])\), if a tuple \(([w], w)\) was previously stored, output \(([w], w)\) to \(V\); otherwise abort.

The descriptions of special honest-verifier ZK argument are deferred to Supplementary Material A.1.

## 2 Technical Overview

### 2.1 From SIMD to General Circuit in ZK

Denote the prover as \(P\) and verifier as \(V\). Define \((B, C)\)-SIMD circuit as \(B\) identical repetitions of a circuit \(C\) with size \(|C| = C\). SIMD-ZK is designed for such circuits. First, we would like to focus on converting SIMD-ZK to general ZK that works for arbitrary circuits. The functionality of ZKP for SIMD circuits is shown in figure 1. \(P\) first groups and commits to the vectors of witnesses. Then it uses the underlying ZKP to prove the relation of committed values by directly operating on commitments. Since elements in each vector are committed in a batch, the operations on the commitment apply to all of the committed elements. For a SIMD-ZK to be interesting, it usually costs less than separately evaluating \(C\) for \(B\) times. For example, AntMan [50] has a complexity of \(O(B + C)\) for \((B, C)\)-SIMD circuits, which shows significant saving on communication compared with its non-SIMD opponents [52,23] that incur \(O(BC)\) complexity.

There are multiple ways to conduct the transformation from SIMD-ZK to general ZK. As discussed in Section 1, such constructions usually need a wire consistency check on top of SIMD-ZK. Taking AntMan [50] as an example, one can first arrange all gates in batches, commit to their input and output wire values, then utilize a SIMD-ZK to prove that all batches of gates are computed correctly. Then an extra protocol is invoked to prove the consistency of each
individual wire value that is repeatedly packed in multiple commitments, E.g. for batched wire values $w_1, w_2 \in \mathbb{F}^B$ and wire indices $i, j \in [B]$, it aims to check whether they satisfy $w_1[i] = w_2[j]$. AntMan requires $O(B^3)$ complexity for checking all combinations of $(i, j) \in [B] \times [B]$, which leads to a total communication complexity of $O(B^3 + C/B)$. This translates to a $O(C^{3/4})$ cost when setting $B = C^{1/4}$. The designated multi-verifier ZK from [53] also uses a similar wire consistency check, which incurs $O(n^2 B^2)$ among $n$ verifiers.

A better wire consistency check. We follow an idea similar to the above but manage to improve the complexity from $O(C^{3/4})$ to $O(C^{1/2})$. As in AntMan [50], we ignore the wiring of the circuit and pack the multiplication gates in blocks of size $B$, which results in $C/B$ batches. The SIMD proof is invoked to first commit to the input and output wires of the packed multiplication gates, then prove the SIMD circuit satisfiability. They totally incur communication complexity $O(C/B)$. Then, we manage to perform the wire consistency check with cost $O(B)$ rather than $O(B^3)$.

Instead of considering the wire consistency among each pair of commitments that contain values from the same wire as done in AntMan, we consider how they are all consistent with a global vector $w$ that contains all wire values in the circuit. Taking the left input wire of all multiplication gates as an example. Define a circuit $\mathcal{C}$ that has a total of $Bm$ wire values and $Bn$ multiplication gates. Assume global wire values $w \in \mathbb{F}^{Bm}$ and the values of left input wires across all multiplication gates $l \in \mathbb{F}^{Bn}$. For any $i \in [Bn]$, the left wire of the $i$-th multiplication gate must be associated with a wire index $\alpha_i \in [Bm]$ such that $l[i] = w[\alpha_i]$. Alternatively, one can define a mapping matrix $L \in \{0, 1\}^{Bn \times Bm}$ such that the $i$-th row $L_i$ is all-zero except at the entry $L_i[\alpha_i]$. In this way, the wire consistency check boils down to check $l = Lu$, where $L$ is public and parties have commitments $\{[u_i]_{i \in [n]}$ and $\{[w_i]_{i \in [m]}$. In the context of SIMD-ZK protocols, values in $l$ and $w$ are batch-committed, meaning that operations on them are
applied to every element in the vector. As a result, it is not straightforward to use SIMD-ZK to prove wire consistency which intuitively involves operations for separate elements.

We sketch our idea below. First, let $\mathcal{V}$ send a challenge vector $\hat{r} \in \mathbb{F}^{|B|n}$ and convert the check of $l = Lw$ to the check of $\hat{r}^\top l = \hat{r}^\top Lw$. This reduces the proof of a matrix-vector multiplication to a proof of two inner products, with an increase in soundness error depending on the distribution of $\hat{r}$. To simplify the notation, we define a public vector $v^\top = \hat{r}^\top L$, then rewrite the above relation as $\hat{r}^\top l = v^\top w$. If we define a circuit $C : \mathbb{F}^{2n+2m+1} \rightarrow \mathbb{F}$ such that $C(\hat{r}_1, \ldots, \hat{r}_n, l_1, \ldots, l_n, v_1, \ldots, v_m, w_1, \ldots, w_m, q) = \sum_{i \in [n]} \hat{r}_i \cdot l_i - \sum_{j \in [m]} v_j \cdot w_j - q$ then $\mathcal{P}$ can prove the above statement by: 1) Divide each of the vectors in $(\hat{r}, l, v, w)$ into length-$B$ segments. Compute and commit to $q := \sum_{i \in [n]} \hat{r}_i \cdot l_i - \sum_{j \in [m]} v_j \cdot w_j \in \mathbb{F}^B$. Prove the consistency between $(\hat{r}, l, v, w, q)$ by using a SIMD-ZK composed of $B$ evaluations of the circuit $C$. 2) prove that $\sum_i q[i] = 0$. This is not obvious, as it involves the computation of the sum of values in one commitment. A naive way is for $\mathcal{P}$ to open the commitment to $q$, but it compromises the zero-knowledge requirements because $q$ is the linear combination of private circuit wire values. To tackle the problem, $\mathcal{P}$ instead commits to a uniform vector $\tilde{r} \in \mathbb{F}^B$ under the constraint that $\sum_{i \in [B]} \tilde{r}[i] = 0$. It should be done before $\mathcal{V}$ samples $\hat{r}$ (else $\mathcal{P}$ can break soundness). After $\mathcal{P}$ commits to the mask vector, $\mathcal{V}$ sends the mask vector, $\mathcal{V}$ sends the challenge $\hat{r}$ and the new SIMD circuit is defined to be

$$C'(\hat{r}_1, \ldots, \hat{r}_n, l_1, \ldots, l_n, v_1, \ldots, v_m, w_1, \ldots, w_m, q, \tilde{r}) = \sum_{i \in [n]} \hat{r}_i \cdot l_i - \sum_{j \in [m]} v_j \cdot w_j - q - \tilde{r}$$

$\mathcal{P}$ computes and commits to $q \in \mathbb{F}^B$ such that

$$q = \sum_{i \in [n]} \hat{r}_i \cdot l_i - \sum_{j \in [m]} v_j \cdot w_j - \tilde{r}.$$

The parties can now use the SIMD-ZK to prove $B$ number of instances of $C'$ with committed inputs $[\hat{r}_1], \ldots, [\hat{r}_n], [l_1], \ldots, [l_n], [v_1], \ldots, [v_m], [w_1], \ldots, [w_m], [q]$ and $[\tilde{r}]$. Finally, the proof of $\sum_i q[i] = 0$ is specific to the underlying commitment schemes. The naive way is to let $\mathcal{P}$ fully open $q$ to $\mathcal{V}$ who verifies its sum locally. This would generally require $O(B)$ communication complexity.

Soundness comes from the randomness of the challenge vector $r$ that is sampled after $\mathcal{P}$ commits to $\hat{r}$. Assume that $\mathbb{F}$ is an exponentially large field and a cheating prover commits to $(l, w)$ such that $l - Lw \neq 0$ for $n$. By Schwarz-Zippel, the probability that the erroneous values happen to be corrected by $\hat{r}$ during the check of $\sum_i q[i] = 0$ where $q := \hat{r}^\top l - \hat{r}^\top Lw$ is $1/|\mathbb{F}|$, which is negligible.
Plugging in the protocol. For a general circuit with a total of $|w| = Bn$ wire values and $C = Bn$ multiplication gates, the above approach leads to a zero-knowledge proof of linear map that can be instantiated by any SIMD-ZK. The actual communication complexity depends on the cost of proving the inner product argument by the underlying SIMD-ZK, plus the opening cost of the commitment scheme. Let $(l, r, o) \in \mathbb{F}^{nb}$ be the batched wire values of left, right and output of multiplication gates in the circuit. The wire consistency can be proven by checking $(l \leftarrow LW, r \leftarrow RW, o \leftarrow Ow)$, where $(L, R, O) \in \mathbb{F}^{nB \times mB}$ are public maps that describe the circuit connectivity. Furthermore, the SIMD-ZK protocol handles the rest of the multiplicative relation check $o = l \cdot r$. This scheme is captured in the extended SIMD-ZK functionality $\mathcal{F}_{\text{SIMDZK}}$ shown in Figure 2. Compared to the common SIMD-ZK functionality shown in Figure 1, it additionally supports the proof of linear map between committed vectors. Based on this extended SIMD-ZK, we propose a compiler that compiles any SIMD-ZK into general ZK. By plugging this compiler to AntMan [50], it improves its communication complexity from $O(C^{3/4})$ to $O(C^{1/2})$. In another case, for compressed $\Sigma$-protocols [3], this yields a reduction of the CRS size from $O(C)$ to $O(\sqrt{C})$ for constant-round sublinear ZK. Eventually, the multi-verifier ZK [53] can be improved from $O(nC/B + n^2B^2)$ to $O(nC/B + n^2)$.

Memory constrained prover. The above construction can be viewed as a compiler that enables a SIMD-ZK to handle arbitrary circuits $\mathcal{C}$, where all wire values fit in a vector $w$ of size $O(C)$. Assume the linear mapping matrices use succinct representation, the proof requires memory overhead $O(C)$, which upper bounds the largest circuit that the scheme can prove. We propose a second compiler that further extends the previous idea to the streaming setting, in which the memory overhead is proportional to the plaintext evaluation of the circuit. Furthermore, the whole circuit structure and the witnesses are not required to be known until they are reached. Instead, $P$ proves the circuit segment-by-segment and only needs to evaluate the current and the previous one at a time: the circuit $\mathcal{C}$ is split into segments $\mathcal{C} = (\mathcal{C}_1, \ldots, \mathcal{C}_n)$ using existing circuit partition methods [43,1,44]. For any consecutive segments $\mathcal{C}_j$ and $\mathcal{C}_{j+1}$, let $(w_j, l_j, r_j, o_j)$ and $(w_{j+1}, l_{j+1}, r_{j+1}, o_{j+1})$ be the witness and the input and output wire values of multiplication gates for each segment. $P$ first uses a commit-and-prove SIMD-ZK to prove the internal satisfiability of $\mathcal{C}_j$ including the linear and multiplicative relations of $(w_j, l_j, r_j, o_j)$. Then $P$ proves that the output wires of $\mathcal{C}_j$ correctly
link to some input wires of $C_{j+1}$. Namely, it additionally invokes the check of linear map to prove $Mw_j = \tilde{w}_{j+1}$, in which $M$ is a map that indicates the connectivity between $C_j$ and $C_{j+1}$ and $\tilde{w}_{j+1}$ are the input wire values of $C_{j+1}$.

After this, $P$ and $V$ discard everything for segment $C_j$ and carry on with the check of internal circuit satisfiability of $C_{j+1}$. The above step incurs memory overhead $O(|w_j| + |\tilde{w}_{j+1}|)$. Based on this framework, $P$ is able to prove the satisfiability of a large circuit by separately evaluating a sequence of smaller circuits.

### 2.2 Improved Commit-and-Prove ZK via SIMD Compiler

Now we show three commit-and-prove SIMD ZK protocols that take advantage of our compilers to perform general ZKP with either reduced online communication complexity or reduced setup cost:

- The aforementioned AntMan [50] requires $O(B+C)$ communication for $(B,|C|)$-SIMD circuit and at least $O(C^{3/4})$ for a general circuit. Our compiler transforms it into a general VOLE-ZK with communication $O(C/B+B)$, which is $O(C^{1/2})$ when $B = O(C^{1/2})$.

- A constant-round SHVZK argument of knowledge for NP from the discrete logarithm assumption with sublinear communication $O(C/B+B) = O(C^{1/2})$ and a CRS of size $O(B) = O(C^{1/2})$, where the computation is dominated by $O(C^{1/2})$-sized FFT. It builds upon the techniques from Attema et al. [3] (denoted as AC20) and is combined with a 2-round SHVZK for Hadamard product of [30]. It improves upon a protocol of AC20 which has a CRS of size $O(C)$ and requires $O(1)$ $C$-sized FFT. For $(B,C)$-SIMD circuit, our protocol has $O(C+\sqrt{B}) = O(C^{1/2})$ communication.

- A non-interactive designated $n$-verifiers ZK based on the packed Shamir secret sharing [53,25]. Restricting $B < n - 2t$ where $t$ is the number of corrupted verifiers, it incurs $O(nC)$ communication overhead for $(B,C)$-SIMD circuits and $O(nC/B + n^2B^2)$ for arbitrary circuit of size $C$. The cost is optimized to $O(nC/B + n^2)$ with the help of our compiler.

Additionally, we also demonstrate that Ligero [2] and its follow-up work [4] perfectly fit our compilers. Although there is no improvement in terms of the proof size or computational complexity, casting Ligero in our framework and using it as a commit-and-prove ZK allows us to identify an important security consideration that would affect both the soundness and zero-knowledge properties.

**Compiling AntMan SIMD-ZK.** The AntMan SIMD-ZK protocol consists of the following key components: 1) a constant-size additive-homomorphic polynomial commitment scheme, 2) a proof of multiplicative relation on committed polynomials, i.e. prove that $f_0(\cdot) = f_1(\cdot) \cdot f_2(\cdot)$, and 3) a proof of degree reduction, i.e. for two polynomials $(f(\cdot),\tilde{f}(\cdot))$ with degrees $d_1 < d_2$, $f(i) = \tilde{f}(i)$ for $i \in [d_1+1]$. We write $\llbracket f \rrbracket$ for a commitment to the polynomial $f(\cdot)$. The AntMan protocol realizes $F_{\text{SIMDZK}}$ as follows:
1. For each batch of $B$ private inputs $w_{\alpha} \in \mathbb{F}^B$, $\mathcal{P}$ computes a degree-$(B-1)$ polynomial $f_{\alpha}$ such that $f_{\alpha}(i) = w_{\alpha}[i]$. $\mathcal{P}$ commits to $f_{\alpha}$ so that $\mathcal{P}$ and $\mathcal{V}$ obtain $[[f_{\alpha}]]$.

2. The parties process the circuit in topological order. For any batch of $k$ addition gates with commitments to input wires ($[[f_{\alpha}]], [[f_{\beta}]]$), $\mathcal{P}$ and $\mathcal{V}$ locally computes the commitment to output wires by $[[f_{\gamma}]] = [[f_{\alpha}]] + [[f_{\beta}]]$. For multiplication gates with input commitments $[[f_{\alpha}]]$ and $[[f_{\beta}]]$, $\mathcal{P}$ computes $w_{\gamma} = w_{\alpha} \cdot w_{\beta}$ and a degree-$(B-1)$ polynomial $f_{\gamma}$ such that $f_{\gamma}(i) = w_{\gamma}[i], i \in [B]$. $\mathcal{P}$ also computes $f_{\gamma}(\cdot) = f_{\alpha}(\cdot) \cdot f_{\beta}(\cdot)$. $\mathcal{P}$ commits to them by generating $[[f_{\gamma}]]$ and $[[f_{\gamma}]]$.

3. For each multiplication gates with input and output wires $(\alpha, \beta, \gamma)$, $\mathcal{P}$ proves that $([[f_{\alpha}]], [[f_{\beta}]], [[f_{\gamma}]]))$ is a multiplication triple and $f_{\gamma}(i) = f_{\gamma}(i)$ for $i \in [B]$.

4. When a batch of $k$ output wires $\alpha$, $\mathcal{P}$ opens the commitment to $f_{\alpha}$, from which $\mathcal{V}$ reconstructs $w_{\alpha}$.

The overhead of AntMan SIMD-ZK lies in the commitment of batch circuit intermediate wire values at Step 2, which takes $O(C)$ for a $(B, C)$-SIMD circuit. The proof of multiplication and degree reduction only incurs $O(B)$ with random linear combination.

When applying the SIMD compiler to the AntMan SIMD-ZK, it takes $O(C/B)$ to prove all multiplicative relations for a general circuit of size $C$. Namely, it checks $C$ multiplication triples $(l, r, o)$ via SIMD-ZK. Additionally, it invokes the proof of linear map to check the wire consistency between $(l, r, o)$ and $w$, which contains intermediate wire values in the circuit. This procedure incurs $O(B)$ communication overhead at the final commitment opening. Hence, it takes $O(C/B + B) \geq O(C^{1/2})$ in total to prove the satisfiability of arbitrary circuits. This protocol is referred as AntMan++. We implemented the AntMan++ and evaluate its performance on proving general circuits of size up to $C = 2^{27}$. It is compared with the prior practical VOLE-based ZK QuickSilver [52], which requires $O(C)$ communication overhead. More details are shown in Section 4.1.

**SIMD-ZK based on Pedersen commitment.** We briefly present a SHVZK argument of knowledge for $(B, C)$-SIMD circuits which relies on the techniques of AC20 [3]. The key construction of AC20 is a compression mechanism to handle ZK proof for general linear relations (the prover wants to prove the correction of evaluation of a linear form over a committed vector). We expand this technique to obtain a constant-round DLOG-based ZK proof for $(B, C)$-SIMD circuits with $O(C + \sqrt{B})$ communication. When plugged into our compiler, we get a constant-round circuit ZK with $O(C^{1/2})$ communication, $O(C^{1/2})$ CRS size, and with computation dominated by $O(C^{1/2})$ FFTs of size $O(C^{1/2})$. It improves over AC20 in both CRS size (from linear to square root) and computation time.

Specifically, for a group of $B$ multiplication gates, we encode the values over all $B$ evaluations on left wire values i.e $x \in \mathbb{F}^{B}$ into one polynomial $f$ using pack secret sharing such that $f(0) \leftarrow \mathbb{F}, f(i) = x_i$ for $i \in [1, B]$ and commit to it using Pedersen commitment to obtain $[[f]] = g^{x'} h^r$ where $x' := (f(0), x) \in \mathbb{F}^{B+1}$. The vector of right wire values $y$ is committed in the same way as $x$ to get $[[y]]$. For the
vector of output wire values $z$, we define $h(X) := f(X)g(X)$ and $[h] = g^x h^r$
where $z' := (h(0), z, h(B + 1), \ldots, h(2B)) \in \mathbb{F}_n^{2B+1}$. $\mathcal{P}$ can convince $\mathcal{V}$ that $z_i = x_i y_i$ for all $i \in [1, s]$ by revealing $f(c), g(c)$ and $h(c)$ where the challenge $c$ is randomly picked by $\mathcal{V}$. $\mathcal{V}$ now checks $f(c)g(c) \equiv h(c)$ while $\mathcal{P}$ needs to prove that the revealed values are correct evaluations of $f(X), g(X)$ and $h(X)$ at $c$. This can be handled by using a ZK proof for linear relations since by Langrage formula, $f(c), g(c)$ and $h(c)$ can be expressed as linear form on the committed vectors $x', y'$ and $z'$. Observe that this way, $\mathcal{P}$ can prove correctness of a batch of $B$-tuples of multiplication gates, by showing that the evaluation of many different committed polynomials at a given challenge $c$ is correctly computed. This can be done using an amortized check over many executions, with cost identical to that of a single execution. Using a sublinear argument for a batch of $B$-tuples of multiplications, the circuit ZK can be obtained by combing our compiler with an amortized nullity check over one commitment scheme which is used to check the consistency of output gates and also of two different commitments of two vectors of the form $x \in \mathbb{F}_n^{B+1}$ and $(x, aux) \in \mathbb{F}_n^{2B+1}$. The details of construction are shown in section 4.2.

**Compiling multi-verifier ZK.** Yang et al. [53] proposed an non-interactive designated multi-verifier zero-knowledge proof (MVZK) that allows a prover to prove the correctness of a statement to a set of $n$ honest-majority verifiers. It leverages packed Shamir secret sharing (PSS) [25] to support SIMD statements. At a high-level, $\mathcal{P}$ first distributes the witnesses to $\mathcal{V}$ in the form of PSS, then utilize a polynomial compression protocol [29,15,12] to reduce the check of all multiplications into a single multiplication triple. The PSS of witnesses serves as commitments among all $\mathcal{V}$, thus it can be viewed as a commit-and-prove SIMD ZK. Effort is made in [53] to convert its SIMD-ZK to general ZK by arranging all wire connection as a tuple $([w_1], [w_2], i, j)$, indicating that $w_1[i] = w_2[j]$. All tuples with the same $(i, j)$ can be checked in a batch with commitment-opening cost $O(n^2)$ by a random linear combination. Since $i, j \in [B]$, the total wire consistency check incurs $O(n^2 B^2)$. However, by applying our SIMD compiler, the overhead for the check is reduced to $O(n^2)$. The cost to prove multiplicative relations remains $O(n C/B)$. One caveat is that this protocol is not flexible in choosing the batch size $B$. Assume the maximum number of corrupted verifiers $t < n/2$, it requires that $2t + B < n$ to ensure that honest verifiers have enough shares to determine the result.

**SIMD-ZK from Ligero.** Our compiler is partially inspired by Ligero [2], an MPC-in-the-head based ZKP [32] that works for general circuits. At the core of Ligero, $\mathcal{P}$ batch encodes the witness using the Reed-Solomon (RS) coding scheme and commits to each entry of the codewords. $\mathcal{V}$ chooses a subset of entries in codewords, and applies the interleaved RS test, linear constraint test and quadratic constraint test to verify the correctness of encoding, wiring consistency and multiplicative consistency.

We first extract a commit-and-prove SIMD-ZK from Ligero and prove that it realizes $F_{\text{SIMDZK}}$. Applying our SIMD compiler would result in the original Ligero. We then identify a security issue when applying SIMD Ligero to
our memory-constrained framework designed for scalable-ZK. Namely, although Ligero can be turned into a commit-and-prove ZK, its commitment only supports a pre-determined limited number of invocations from the proving procedure. Following the MPC-in-the-head paradigm, the committing phase mentioned above is equivalent to emulating a $n$-party computation of such circuit, then separately commit to the view of each party among $(P_1, \ldots, P_n)$. During the proving phase, $P$ opens a subset of $t < n$ views to $V$, who applies the above mentioned tests. This is fine for one-shot proofs. However, general commit-and-prove ZK does not restrict the number of times that the proving procedure is applied to a commitment. The zero-knowledge property can be compromised if the number of opened views exceeds the degree parameter of Reed-Solomon encoding. Although refreshing the commitment solves this issue, a proof of equality across the obsolete and new commitments does not come for free. Our framework covers this issue by adding a counter in $F_{\text{SIMDZK}}$ to check the usage of each commitment, and abort the proving phase when an input commitment is overused.

### 3 Generic Compiler of ZK Proofs from SIMD Circuits to Arbitrary Circuits

In this section, we first present a construction for extended SIMD-ZK functionality $F_{\text{esSIMDZK}}$ which supports the proof of linear map, in addition to the normal SIMD-ZK functionality $F_{\text{SIMDZK}}$. Based on the extended SIMD-ZK, we describe our compiler that enables a SIMD-ZK scheme to work for general circuits. At last, we present a framework that allows SIMD-ZK schemes to prove large statements with small memory footprints.

#### 3.1 Extended SIMD-ZK

The protocol for extended SIMD-ZK is shown in Figure 3, which realizes the functionality $F_{\text{esSIMDZK}}$. It is based on the $F_{\text{SIMDZK}}$ functionality to perform the committing and opening of batched wire values, as well as prove the element-wise multiplicative relations between these batches. It takes input a public matrix $M \in \mathbb{F}^{Bn \times Bk}$ and two vectors $x = (x_1, \ldots, x_n) \in \mathbb{F}^{Bn}$ and $y = (y_1, \ldots, y_k) \in \mathbb{F}^{Bk}$ from $P$, outputs 1-bit information to $V$ indicating whether $x = My$. Essentially, it is a proof of linear map. The first step is to reduce the proof of linear map to a proof of inner products, which is achieved by a random linear combination: $V$ uniformly samples $\hat{r} \in \mathbb{F}^{Bn}$ and converts the check of $x \overset{?}{=} My$ into $\hat{r}^\top x = v^\top y$, where $v^\top = \hat{r}^\top M$. After dividing these vectors into length-$B$ segments, $P$ and $V$ invoke the $F_{\text{SIMDZK}}$ functionality of batch size $B$. $P$ inputs $q$ and proves the correctness of $q = \sum_{i=1}^n \hat{r}_i \ast x_i - \sum_{j=1}^k v_i \ast y_i \in \mathbb{F}^B$. Eventually it opens the commitment to $q$ and let $V$ check $\sum_{i=1}^B q[i] \overset{?}{=} 0$. To ensure the privacy of $P$, it needs to make sure that only opened commitment to $q$ does not reveal information of $x$ and $y$. It does so by the random mask $\tilde{r}$. The impact of this mask on soundness is negligible since it is committed before $\hat{r}$ is sampled.
We first consider the case of a malicious prover and then the case of a malicious verifier. In each case, we construct a PPT simulator \( \mathcal{S} \) given access to functionality \( \mathcal{F}_k \), and running a PPT adversary \( \mathcal{A} \) as a subroutine while emulating \( \mathcal{F}_k \) for \( \mathcal{A} \). We show that no PPT environment \( \mathcal{Z} \) can distinguish the real-world execution from the ideal-world execution.

**Theorem 1.** Protocol \( \Pi_{k \text{SIMDZK}} \) (Figure 3) securely realizes the Functionality \( \mathcal{F}_k \) in the \( \mathcal{F}_k \)-hybrid model, with soundness error \(|F|^{-1}\).

**Proof.** We first consider the case of a malicious prover and then the case of a malicious verifier. In each case, we construct a PPT simulator \( \mathcal{S} \) given access to functionality \( \mathcal{F}_k \), and running a PPT adversary \( \mathcal{A} \) as a subroutine while emulating \( \mathcal{F}_k \) for \( \mathcal{A} \). We show that no PPT environment \( \mathcal{Z} \) can distinguish the real-world execution from the ideal-world execution.

**Protocol \( \Pi_{k \text{SIMDZK}} \)**

**Inputs:** The prover \( \mathcal{P} \) and verifier \( \mathcal{V} \) hold a public matrix \( \mathbf{M} \in \mathbb{F}^{Bn \times Bk} \) for some integers \( n \) and \( k \). Commitments \([\mathbf{x}]\) and \([\mathbf{y}]\) are public, where \( \mathbf{x} \in \mathbb{F}^{Bn} \) and \( \mathbf{y} \in \mathbb{F}^{Bk} \).

**Protocol:**

1. \( \mathcal{P} \) uniformly samples a vector \( \hat{\mathbf{r}} \in \mathbb{F}^B \) such that \( \sum_{i=1}^{B} \hat{r}_i = 0 \). Then \( \mathcal{F}_k \) is invoked to obtain its commitment \([\hat{\mathbf{r}}]\).
2. \( \mathcal{V} \) uniformly samples a vector \( \hat{\mathbf{v}} \in \mathbb{F}^{Bn} \) and sends it to \( \mathcal{P} \). Everyone computes\( \mathbf{v} = \hat{\mathbf{r}}^T \mathbf{M} \in \mathbb{F}^{Bk} \). Then for \( i \in [k] \), \( \mathcal{F}_k \) is invoked to construct \([\mathbf{v}_i]\), where \( \mathbf{v}_i \) is the \( i \)-th B-sized vector of \( \mathbf{v} \). In the same way, everyone can have access to \([\hat{\mathbf{r}}, \mathbf{v}_1, \ldots, \mathbf{v}_k]\).
3. \( \mathcal{P} \) computes \( \mathbf{q} \in \mathbb{F}^B \), such that \( \mathbf{q}[i] = \sum_{j=1}^{n} \hat{r}_j \mathbf{v}_j[i] - \sum_{j=1}^{k} \mathbf{v}_j[i] \mathbf{y}_j[i] + \hat{r}_i \). \( \mathcal{P} \) invokes \( \mathcal{F}_k \) to obtain \([\mathbf{q}]\).
4. Define circuit

\[
C_{\text{Lin}}(a_1, \ldots, a_n, b_1, \ldots, b_n, c_1, \ldots, c_k, d_1, \ldots, d_k, e, f) := \sum_{i=1}^{n} a_i \cdot b_i - \sum_{i=1}^{k} c_i \cdot d_i + e - f,
\]

then call \( \mathcal{F}_k \cdot \text{Prove}(C_{\text{Lin}}, [\hat{\mathbf{r}}_1], \ldots, [\hat{\mathbf{r}}_n], [\mathbf{x}_1], \ldots, [\mathbf{x}_n], [\mathbf{v}_1], \ldots, [\mathbf{v}_k], [\mathbf{y}_1], \ldots, [\mathbf{y}_k], [\hat{\mathbf{r}}], [\mathbf{q}]) \).
5. \( \mathcal{V} \) sends \((\text{Open}, [\mathbf{q}])\) to \( \mathcal{F}_k \), which returns \( \mathbf{q} \) to \( \mathcal{V} \); \( \mathcal{V} \) checks \( \sum_{i=1}^{B} \mathbf{q}[i] = 0 \) and aborts if the check fails.

Fig. 3: The protocol for extended SIMD ZK from SIMD ZK.
By emulating the (Commit) command of $F_{\text{SIMDZK}}$, $S$ receives $\hat{r}$ from $A$ and sends a handler $[\hat{r}]$ to $A$.

2. $S$ uniformly samples $\hat{r} \in \mathbb{F}^{Bn}$ and sends to $A$. For $i \in [k]$, after receiving (Commit, $v_i$) from $A$, $S$ sends a handler $[v_i]$ to $A$. Similarly, $S$ sends $[\hat{r}_i]$ to $A$ for $i \in [n]$.

3. After receiving (Commit, $q$) from $A$, $S$ emulates $F_{\text{SIMDZK}}$ by sending $A$ another handler $[q]$.

4. $S$ receives $(\text{Prove}, C, \tau_1, \ldots, \tau_{2n+2k+2})$ from $A$, and then checks whether $\tau_i$ for all $i \in [2n+2k+2]$ match their corresponding tags. For $i \in [B]$, $S$ checks whether $\sum_{j=1}^{n} \hat{r}_j[i]x_j[i] - \sum_{j=1}^{k} y_j[i] + \hat{r}[i] - q[i]$ equals to 0 or not. If any check fails, $S$ aborts; otherwise sends Pass to $A$.

5. $S$ emulates the (Open) command of $F_{\text{SIMDZK}}$ and receives a handler $\tau$ from $A$. If $\tau$ does not match $[q]$ or the vector $q$ previously sent by $A$ does not satisfy $\sum_{i=1}^{B} q[i] = 0$, $S$ aborts.

Define $E$ to be the event that a cheating prover $A$ successfully convinces $V$ in the real world. This happens when $r$ accidentally corrects the wrong input of $A$. Define $z = My$ and

$$f(x_1, \ldots, x_{Bn}) = \sum_{i=1}^{Bn} x_i(z[i] - x[i]) + \sum_{i=1}^{B} \hat{r}[i].$$

With fixed $x, z, \hat{r}$ and uniformly sampled $\hat{r}$, we have

$$\Pr[E|x \neq My] = \Pr[f(\hat{r}) = 0|x \neq My] = |\mathbb{F}|^{-1},$$

since $f(x_1, \ldots, x_{Bn})$ is a $Bn$-variate degree-1 polynomial. Hence we conclude that $A$ cannot distinguish between the real and ideal world except with probability $|\mathbb{F}|^{-1}$.

**Malicious verifier.** Similarly in this case, $S$ interacts with $A$ as follows:

1. To emulate the (Commit) command, $S$ sends a handler $[\hat{r}]$ to $A$.

2. $S$ receives $\hat{r}$ and (Commit, $v_i$) from $A$ for $i \in [k]$. Then $S$ emulates $F_{\text{SIMDZK}}$ by sending $A$ a handler $[v_i]$ for $i \in [k]$. In the same way, $S$ sends $A$ handlers $\{[\hat{r}_i]\}_{i \in [n]}$.

3. Then, $S$ plays the role of $F_{\text{SIMDZK}}$ and sends a handler $[q]$ to $A$.

4. $S$ receives $(\text{Prove}, C, \tau_1, \ldots, \tau_{2n+2k+2})$ from $A$ and checks whether $\{\tau_i\}_{i \in [2n+2k+2]}$ match their corresponding tags. Then $S$ queries $F_{\text{SIMDZK}}$. If check fails or $F_{\text{SIMDZK}}$ aborts, $S$ aborts; otherwise sends Pass to $A$.

5. By emulating the (Open) command of $F_{\text{SIMDZK}}$, $S$ uniformly samples a vector $q \in \mathbb{F}^B$ such that $\sum_{i=1}^{B} q[i] = 0$ and sends $q$ to $A$.

The only difference between reality and the ideal world is the method of calculating vector $q$. Following the constraint $\sum_{i=1}^{B} q[i] = 0$, $S$ uniformly samples
Inputs: The prover $\mathcal{P}$ and verifier $\mathcal{V}$ hold an arbitrary circuit $\mathcal{C}$ over a large field $\mathbb{F}$, where $\mathcal{C}$ contains $N_1 = Bn_1$ addition gates, $N_2 = Bn_2$ multiplication gates and $K = Bk$ wires for some $n_1, n_2$ and $k$.

Protocol:

1. Set $c = 1$. For each gate in the form $(i, \alpha, \beta, \gamma, T)$
   - If $T = ADD$, set $A_i := I_\alpha + I_\beta - I_\gamma$; $\mathcal{P}$ sets $w[\gamma] := w[\alpha] + w[\beta]$
   - If $T = MULT$, set $(L_\alpha, R_\alpha, O_\alpha) := (I_\alpha, I_\beta, I_\gamma)$; $\mathcal{P}$ sets $(l[c], r[c], o[c]) := (w[\alpha], w[\beta], w[\alpha] \cdot w[\beta])$. Increase $c$ by 1.

After the circuit is processed, matrix $L, R, O \in \mathbb{F}^{N_2 \times K}$, and $A \in \mathbb{F}^{N_1 \times K}$ are public; $\mathcal{P}$ has $(l, r, o, w) \in \mathbb{F}^{N_2 \times 1}$ and $(l[c], r[c], o[c], w[\alpha])$ sets.

2. $\mathcal{P}$ splits wire values $(l, r, o, w)$ into chunks of size $B$, i.e., $(l_{i[B]}, r_{i[B]}, o_{i[B]}, w_{i[B]})$ such that each element is in $\mathbb{F}^B$. $\mathcal{F}_{\text{eSIMDZK}}$ is invoked to obtain commitments $(l_i, [l_i], [o_i], [w_i])_{i \in [n_2]}$ and $(l_{i[B]}, [l_{i[B]}], [o_{i[B]}], [w_{i[B]}])_{i \in [n_2]}$.

3. Then, $\mathcal{P}$ sends $(\text{LinearMap}, \{[l_i], [w_i]\}_{i \in [n_2]}, \{[o_i]\}_{i \in [n_2]}, L)$ to $\mathcal{F}_{\text{eSIMDZK}}$ to check that $l = LW$; similarly check that $r = Rw$, $o = Ow$, and that $0 = Aw$.

4. Let circuit $C_{\text{Mult}} : \mathbb{F}^3 \rightarrow \mathbb{F}$ such that $C_{\text{Mult}}(x, y, z) := xy - z$. For $i \in [n_2]$, send $(\text{Prove}, C_{\text{Mult}}, [l_i], [r_i], [o_i], [w_i])$ to $\mathcal{F}_{\text{eSIMDZK}}$.

Fig. 4: Generic ZK in the $\mathcal{F}_{\text{eSIMDZK}}$ hybrid.

vector $q$. While in reality, each entry of $q$ is masked by vector $\hat{r}$ chosen by $\mathcal{P}$. As a result, in both worlds, all entries except one of $q$ are information-theoretic secure, so no one can distinguish one from another.

Overall, any PPT environment $Z$ cannot distinguish between the real-world execution and ideal-world execution, which completes the proof.

3.2 Compiling Extended SIMD-ZK

The general approach to compile a SIMD protocol into a generic protocol is to supplement it with an additional proof of wiring consistency. Namely, denote $w$ as a vector that includes all the wire values in a circuit, then any input wire of a multiplication gate can be represented as the linear combination of a series of values in $w$, who are the wire values that connect from the circuit inputs or the output of other gates. This relation can be generally represented as a linear map $M$ between a vector of wire values $x$ and $w$, which should satisfy $x = Mw$. As shown in Figure 4, along with the vector $w$, $\mathcal{P}$ also commits to $(l, r, o)$ which are the batches of input and output wire values of multiplication gates. Showing that $o = l \ast r$ is enough to prove that all multiplication gates are computed correctly. Additionally, $\mathcal{P}$ also proves the correctness of $(l = LW, r = Rw, o = Ow)$, in which $(L, R, O)$ are the linear maps that defines the routing of wires that connects to the input and output wires of multiplication gates. Additionally, the proof of $0 = Aw$ shows the correct computation of all addition gates.

To handle a general circuit $\mathcal{C}$, our compiler fully depends on the extended SIMD-ZK functionality $\mathcal{F}_{\text{eSIMDZK}}$. Regarding the cost analysis, $\mathcal{P}$ commits to
a total of $k + 3n_2$ size-$B$ vectors to $\mathcal{V}$. They invoke the proof of linear map for 4 times to prove the wiring consistency, and the proof of element-wise multiplication to prove the correctness of $n_2$ batches of multiplication gates. An optimization to reduce the cost for the proof of linear map is to combine the 4 of them into 1. Namely, define $w'$ to be the wire values excluding the input and output wires of multiplication gates. Construct the witness vector $w = (w'[0][1, o])$ and prove the wiring consistency by proving $0 = A'w$, in which $A' \in \mathbb{F}^{K \times K}$ is a map that describes the circuit wire connectivity.

**Theorem 2.** The Protocol $\Pi_{\text{compiler}}$ (Figure 4) securely realizes the functionality $F_{\mathsf{ZK}}$ in the $\mathcal{F}_{\mathsf{SIMDZK}}$-hybrid model, with 0 soundness error.

**Proof.** Similarly, we construct a PPT simulator in two cases and argue that no PPT environment $Z$ can distinguish reality and the ideal world.

**Malicious prover.** The simulator $S$ simulates the view of adversary $A$ for the protocol execution of $\Pi_{\text{compiler}}$, as follows:

1. Following the protocol specification, $S$ obtain matrix $L, R, O$ and $A$ from circuit $C$.
2. By emulating the (Commit) command of $F_{\mathsf{SIMDZK}}$, $S$ receives $\{l_i, r_i, o_i\}_{i \in [n_2]}$ and $\{w_i\}_{i \in [k]}$ from $A$ and sends $A$ handlers $\{[l_i], [r_i], [o_i]\}_{i \in [n_2]}$ and $\{[w_i]\}_{i \in [k]}$.
3. After receiving (LinearMap, $\{\tau_i\}_{i \in [n_2+k]}, L$) from $A$, $S$ checks whether $\{\tau_i\}_{i \in [n_2]}$ match $\{[l_i]\}_{i \in [n_2]}$ and $\{\tau_i\}_{i \in [n_2+1, n_2+k]}$ matches $\{[w_i]\}_{i \in [k]}$. Then, $S$ checks whether $l = Lw$. If any check fails, $S$ aborts; otherwise, $S$ sends Pass to $A$.

   Similarly, $S$ handles other three (LinearMap) commands from $A$.
4. For $i \in [n_2]$, $S$ receives (Prove, $C, \tau_1, \tau_2, \tau_3$) from $A$ and checks whether $\{\tau_1, \tau_2, \tau_3\}$ match the tags $\{[l_i], [r_i], [o_i]\}$. In each round, $S$ also checks that $l_i[j], r_i[j] = o_i[j]$ for $j \in [B]$. If any check fails, $S$ aborts; otherwise, $S$ sends Pass to $A$.

It is trivial that $S$ is perfect, since whenever an ideal functionality is called in the protocol, $S$ acts exactly the same as the definition of the functionality. On the other hand, if the witness indeed satisfies linear as well as the multiplication constraints, we can conclude that it satisfies circuit $C$. Given the perfectness of the ideal functionality, we can conclude that the soundness error is 0.

**Malicious verifier.** The simulator $S$ simulates the view of adversary $A$ for the protocol execution of $\Pi_{\text{compiler}}$, as follows:

1. $S$ follows the protocol specification and obtain matrix $L, R, O$ and $A$ from circuit $C$.
2. By emulating the (Commit) command of functionality $F_{\mathsf{SIMDZK}}$, $S$ sends $A$ handlers $\{[l_i], [r_i], [o_i]\}_{i \in [n_2]}$ and $\{[w_i]\}_{i \in [k]}$.
3. After receiving (LinearMap, $\{\tau_i\}_{i \in [n_2+k]}, L$) from $A$, $S$ checks whether $\{\tau_i\}_{i \in [n_2]}$ match $\{[l_i]\}_{i \in [n_2]}$ and $\{\tau_i\}_{i \in [n_2+1, n_2+k]}$ matches $\{[w_i]\}_{i \in [k]}$. Then, $S$ queries $F_{\mathsf{ZK}}$. If check fails or $F_{\mathsf{ZK}}$ aborts, $S$ aborts; otherwise, $S$ sends Pass to $A$.

   Similarly, $S$ handles other three (LinearMap) command from $A$.
4. For $i \in [n_2]$, $S$ receives (Prove, $C, \tau_1, \tau_2, \tau_3$) from $A$ and checks whether $\{\tau_1, \tau_2, \tau_3\}$ match the tags $\{[l_i], [r_i], [o_i]\}$. In each round, $S$ also queries $F_{\mathsf{ZK}}$. If any check fails or $F_{\mathsf{ZK}}$ aborts, $S$ aborts; otherwise, $S$ sends Pass to $A$. 

17
Similarly, since $S$ acts according to the definition of the ideal functionality and there is no commitment opening during the protocol, the simulation is perfect.

As a result, no PPT environment $Z$ can distinguish between the real-world scenario and the ideal-world execution, which completes the proof.

### 3.3 Generic ZK for Limited-Memory

Besides a basic-version compiler, we also present another compiler that can deal with a situation where the prover’s memory is limited. Although a similar question has already been proposed before [7,27,41,37], our construction does not rely on any complicated assumption other than the realizatin of $\mathcal{F}_{eSIMDZK}$ with the parameter $\tau_{\text{max}} > 1$. The protocol is shown in Figure 5. We take the advantage of the commit-and-prove paradigm: instead of proving the whole circuit at one time, circuit can be “partially" proved. The value of wires that connect between different parts of the circuit can be reserved as commitments and used for the proof of connectivity. Specifically, prover will clarify a space threshold parameter $S$ before the proof, and the original circuit $C$ will be divided into $\lceil |C|/S \rceil$ parts (denoted as $C_1, C_2, \ldots, C_{\lceil |C|/S \rceil}$), where each part contains at most $S$ gates. In each round, $S$ gates of $C_i$ will be read and processed in the memory, and $P$ generates the proof for $C_i$. At the end of each round, $P$ commits to a vector which contains all the wire values that are still active in $C_{i+1}$, and discards those that won’t be used in the remaining circuit.

To support this pruning operation, we add a $\text{DEL}$ gate to the encoding of the circuit. $P$ reads the circuit from a stream of $(\alpha, \beta, \gamma, T)$, where $T \in \{\text{ADD, MULT, DEL}\}$. If $T \in \{\text{ADD, MULT}\}$, $P$ processes gates $\alpha, \beta, \gamma$ similarly as the previous compiler. If $T = \text{DEL}$, $P$ adds gate $\alpha$ to the set $D$, which contains all the wire values that no longer appear in the next segment of the circuit. After the proof of consistency inside $C_i$, $P$ forms a new commitment to wire values that are not in the set $D$. By applying $\mathcal{F}_{eSIMDZK}.\text{LinearMap}$, $P$ proves that the committed wire values belongs to the output wires of $C_i$, which are also the input of $C_{i+1}$. $P$ and $V$ repeat this procedure for the proof of each segment.

Now we claim that if the plaintext evaluation of circuit $C$ requires memory space $M$, then in our protocol, the prover’s space complexity is $O(M)$. Denote $\mathbf{o}_i$ as the output of subcircuit $C_i$, and circuit input $\mathbf{x}$ is denoted as $\mathbf{o}_0$. In each round, we call $\mathcal{F}_{eSIMDZK}.\text{Prove}$ to complete the proof for $C_i$ and $\mathcal{F}_{eSIMDZK}.\text{LinearMap}$ to prove the transformation between $[\mathbf{o}_{i-1}]$ and $[\mathbf{o}_i]$. As each subcircuit contains at most $S$ gates, proving $C_i$ requires $O(S)$ space. And also, using $\mathcal{F}_{eSIMDZK}.\text{LinearMap}$ to prove the consistency between $[\mathbf{o}_{i-1}]$ and $[\mathbf{o}_i]$, requires $O(|\mathbf{o}_{i-1}| + |\mathbf{o}_i|)$ space, so the space complexity of each round is $O(S + |\mathbf{o}_{i-1}| + |\mathbf{o}_i|)$. As a result, the overall space complexity is $O(S + \max\{|\mathbf{o}_{i-1}| + |\mathbf{o}_i|\})$. Since in the plaintext evaluation of $C$, only active wire value needs to be read into the memory, memory upper bound $M \geq \max\{|\mathbf{o}_0|, |\mathbf{o}_1|, |\mathbf{o}_2|, \ldots, |\mathbf{o}_{\lceil |C|/S \rceil}|\}$. By choosing $S < M$, we can conclude that the space complexity of the protocol is $O(M)$.
Inputs: The prover $P$ and verifier $V$ hold an arbitrary circuit $C$ over a large field $\mathbb{F}$, and a space threshold parameter $S = sB$ for some integer $s$. $P$ holds the secret input $x$ such that $C(x) = 0$.

Protocol:

1. Let $h() : \mathbb{Z} \to \mathbb{Z}$ be a function map wire indices to physical indices and $w$ be a dynamic list storing wire value to be dealt with in the current round. Initially, set $h(i) = i$ and $w[i] = x[i]$ for all $i \in \|x\|$. Define function $\text{Im}(i) : f \to \mathbb{Z}$, returning the maximum index that $f$ has the definition.

2. Let $D = \emptyset$ and $W = \text{Im}(h) + S$. Initialize $L, R, O, A$ to be empty matrices. Read the next $S$ gates to the memory (or until the last gate). For each in the form $(\alpha, \beta, \gamma, T)$:
   - If $T = \text{ADD}$, set $h(\gamma) := \text{Im}(h) + 1$, compute $r := I_{h(\alpha)} + I_{h(\beta)} - I_{h(\gamma)} \in \mathbb{F}^W$ and append $r$ to $A$. $P$ sets $w[h(\gamma)] := w[h(\alpha)] + w[h(\beta)]$
   - If $T = \text{MULT}$, set $h(\gamma) := \text{Im}(h) + 1$, append rows in $\mathbb{F}^W I_{h(\alpha)}, I_{h(\beta)}, I_{h(\gamma)}$ to matrices $L, R, O$ respectively. $P$ sets $w[h(\gamma)] := w[h(\alpha)] \cdot w[h(\beta)]$ and append values $w[h(\alpha)], w[h(\beta)], w[h(\gamma)]$ to vectors $l, r, o$ respectively.
   - If $T = \text{DEL}$, add $\alpha$ to $D$.

Suppose that there are $S_1 = s_1B$ addition gates and $S_2 = s_2B$ multiplication gates ($S = S_1 + S_2$), and after processing $S$ gates, $|w| = kB$. $A \in \mathbb{F}^{S_1 \times W}$ and $L, R, O \in \mathbb{F}^{S_2 \times W}$ are public.

3. $P$ splits wire values $(l, r, o, w)$ into chunks of size $B$, i.e., $(\{I_l, r_1, o_1\}, \{I_r, r_2, o_2\})$ and $(\{w_1\}, \{w_2\})$, such that each element is in $\mathbb{F}^B$. $F_{\text{SIMDZK}}$ is invoked to obtain commitments $(I_l, I_r, I_0) \in \mathbb{F}_{\text{SIMDZK}}[\{I_l, I_r, I_0\}]$ and $(\{w_1\}, \{w_2\}) \in \mathbb{F}_{\text{SIMDZK}}[\{w_1\}, \{w_2\}]$.

4. Then, $(\text{LinearMap}(\{I_l\}, \{I_r\}), \{w_1\}) \in \mathbb{F}_{\text{SIMDZK}}[\{I_l\}] \times \mathbb{F}_{\text{SIMDZK}}[\{w_1\}]$ is sent to $F_{\text{SIMDZK}}$ to check that $l = \text{Lw}$; similarly check that $r = \text{Rw}$, $o = \text{Ow}$, and that $0 = \text{Aw}$.

5. Let circuit $C_{\text{Mult}} : \mathbb{F}^2 \to \mathbb{F}$ such that $C_{\text{Mult}}(x, y, z) := xy - z$. For $i \in \{s_2, \ldots, s_1\}$, $(\text{Prove}, C_{\text{Mult}}, \{I_l\}, \{I_r\}, \{w_1\})$ is sent to $F_{\text{SIMDZK}}$.

6. Let $R = \text{Domain}(h) \setminus D$. Suppose that $|R| = k'$. For the $i$-th element in $R$, let $h'(R[i]) = i$, and set the $i$-th row of $H$ as $I_h[R[i]]$.

7. $P$ computes $w'$ such that for each $w'[h'(i)] = w[h(i)]$. Append 0 to $w'$ and 0 to $H$ until the size of $w'$ becomes a multiple of $B$. Suppose that $|w'| = kB$, and then $P$ calls Commit to obtain $(\{w'_1\}, \{w'_2\}, \{w'_3\})$. Update $(h, w) := (h', w')$.

8. Both parties call $(\text{LinearMap}, \{w'_1\}, \{w'_2\}, \{w'_3\})$ to check the consistency between $w$ and $w'$.

9. If more gates need to be processed, jump to step 2.

Fig. 5: Generic ZK in limited-memory scenario.

4 Efficient Instantiations of Our Compiler

The only assumption that our general compiler described in Section 3.2 makes is that the underlying ZKP realizes the extended SIMD-ZK functionality $F_{\text{SIMDZK}}$. The compiler for scalable ZK described in Section 3.3 only additionally requires the parameter $\tau_{\text{max}} > 1$ for $F_{\text{SIMDZK}}$. In this section, we show three instantiations of SIMD-ZK that benefits from these compilers, including
Protocol $\Pi_{\text{AntMan}}$

**Public inputs.** The prover $P$ and verifier $V$ hold a general circuit $C$ over a large field $F$, where $C$ contains $n = |C|$ multiplication gates and $m$ input gates. Let $\alpha_1,\ldots,\alpha_B \in F$ be $B$ distinct elements that are fixed for the whole protocol execution. Both parties invoke Initialize() in IT-PAC to obtain $\tau_1$.

**Private input.** $P$ holds $m$ witnesses $w_1,\ldots,w_m \in F^B$ such that $C(w_1[i],\ldots,w_m[i]) = 0$ for all $i \in [B]$.

**Commit:** On input $w \in F^B$, $P$ computes polynomial $f(\cdot) = \sum_{i=0}^{B-1} f_i \cdot X^i$ such that for $i \in [B]$, $f(\alpha_i) = w_i$. Both parties invoke $(\langle b \rangle, \tau_2) \leftarrow \text{PreGen}(f)$. $P$ obtains $\langle b \rangle$ and $V$ obtains $\tau_2$. If $\Lambda$ has been revealed, invoke $(M, K) \leftarrow \text{Gen}(\tau_1, \tau_2)$. $P$ holds $M$ and $V$ holds $K$.

**Open:** On input $(\langle f(\cdot) \rangle, f(\cdot), \Lambda)$, both parties compute that $J^\mu K := J^f(\Lambda) K - f(\Lambda)$.

Let $H : \{0,1\}^* \to \{0,1\}$ be a random oracle. $P$ sends $H(M_\mu)$ to $V$ who checks whether $H(M_\mu) = H(K_\mu)$.

Fig. 6: The protocol of SIMDZK from AntMan.

- A ZKP based on vector oblivious linear evaluation [50].
- A zk-SNARK from $\Sigma$-protocol [3].
- A designated multi-verifier ZKP based on packed Shamir secret sharing and recursive inner product check [53].

All of these works are in the form of SIMD-ZK and originally require significant extra effort to be converted into a general ZK. Our compilers are able to transform them into general ZK with decrease in their proof size or setup cost. The only exception is the AntMan [50] which is restricted by $\tau_{\text{max}} = 1$ thus does not fit into the second compiler for scalable ZK. In Supplementary Material A.5 and A.6, we additionally describe a SIMD-ZK that is extracted from an MPC-in-the-head scheme Ligero [2], and a construction from LegoSNARK [18]. Both our compilers are generalizations of them and a follow-up work [4]. We show that Ligero SIMD-ZK perfectly fits our compilers and discuss extra caution that need to take when compiling Ligero.

4.1 AntMan++: Sublinear Designated-Verifier ZK

AntMan [50] is a sublinear VOLE-based ZK proof for SIMD circuits, which only requires communicating $O(B+|C|)$ field elements to prove a $(B,C)$-SIMD circuit. It also presents a construction for proving a single execution of an arbitrary circuit, by breaking down the circuits into individual gates and batching them as SIMD circuits. The proving of SIMD circuits requires sending $O(|C|/B + B)$ field elements, and the cost to check the wire-value consistency is $O(B^3)$, which leads to $O(|C|^{3/4})$ communication complexity in optimal. It is the only sublinear-communication VOLE-ZK protocol for proving an arbitrary circuit. In AntMan [50], the information-theoretic polynomial authentication code $\Pi_{\text{IT-PAC}}$ servers as a polynomial commitment scheme. For arbitrary degree-$k$ polynomial
$f(\cdot)$ known by $P$, an IT-PAC $\langle f(\cdot) \rangle$ consists of a MAC $M \in \mathbb{F}$ known by $P$ and a tuple of keys $(K, \Delta, \Lambda) \in \mathbb{F}^3$ known by $V$, such that $M = K + f(\Lambda) \cdot \Delta$. In the following, we first detail the commitment scheme used in the AntMan protocol, then discuss how to enable AntMan to prove arbitrary circuits.

**Information-theoretic polynomial authentication code $\Pi_{IT-PAC}$.** As shown in Figure 7, the protocol is designed in the ($F_{VOLE}, F_{Com}$)-hybrid model. It adopts additively homomorphic encryption (AHE) scheme to obliviously evaluate a polynomial, where the polynomial is known by $P$ and the secret point $\Lambda$ is known by $V$. Then VOLE correlations further transform such oblivious polynomial evaluation (OPE) into IT-PACs. A critical issue is to guarantee that the HE ciphertext which encodes the evaluation point $\Lambda$ is correct. Instead of using the zero-knowledge proof of knowledge for the proof of validity (as done in several MPC protocols [34,21]), AntMan utilizes a simple commit-and-open approach. Specifically, $V$ first commits to the randomness that are used to generate the HE ciphertexts $\langle \Lambda^1 \rangle, \ldots, \langle \Lambda^k \rangle$. After receiving HE ciphertexts from $V$, $P$ performs the homomorphic evaluation and commits to all of HE ciphertexts $(b)$ that it should send to $V$ for OPE. Then $V$ opens the randomness and let $P$ check the correctness of $\langle \Lambda^1 \rangle, \ldots, \langle \Lambda^k \rangle$. If they are valid, $P$ opens $(b)$ to continue with the execution of OPE. This allows the AntMan protocol to remove the possible leakage of secret polynomials, which is incurred by homomorphically performing polynomial evaluation upon incorrect ciphertexts.

**AntMan++.** By applying our SIMD compiler to the original SIMD AntMan, we propose AntMan++, which is a more efficient VOLE-based ZK proof for arbitrary circuits. Similar to the original AntMan, we first batch arithmetic gates and prove their correctness. The generation of IT-PACs of all the wire values incurs $O(|C|/B)$ communication complexity. Additionally, checking the correctness of multiplication gates requires an opening of size $B$.

The improvement of AntMan++ lies in the proof of wire consistency. As shown in $\Pi_{compiler}$, this problem is transferred into proof of linear map. And we use a random vector to further transfer linear-mapping proof into inner-product proof. In AntMan, we observe that the proof of inner product between public and private vectors takes only $O(B)$ communication overhead. Suppose the challenge vector $r$ is public and witness $x$ is private, and the IT-PACs of two vectors are known to both parties. After the secret evaluation point $\Lambda$ is revealed, both parties can locally calculate $f_r(\Lambda)$ because $r$ is known. Via the additively homomorphic property of IT-PACs, both parties compute $f_r(\Lambda) \cdot [x]$, which is also the IT-PAC of Hadamard product of $r$ and $x$. In this way, both parties compute $n + k$ IT-PACs and add them up to obtain $[q]$. In the end, according to the protocol in figure 3, both parties open the vector of size $B$ and check whether their sum equals to 0. As a result, the communication cost of AntMan++ is $O(|C|/B + B)$. When setting $B = |C|^{1/2}$, it results in $O(|C|^{1/2})$.

The full description of SIMD AntMan is shown in Protocol 6 and 8.

**Performance evaluation.** We implement the AntMan++ protocol and benchmark its performance. Its homomorphic encryption (HE) is supported by the
Protocol $\Pi_{IT-PAC}^A$

Let AHE = (Setup, KeyGen, Enc, Dec) be an additively homomorphic encryption scheme. Suppose that two parties $P$ and $V$ have already agreed a set of public parameters $\text{par} = \text{Setup}(1^\lambda)$ and global key $\Delta \in F$. Let $G$ be a PRG. Let $k$ be the maximum degree of the polynomials committed in each IT-PAC.

**Initialize.**

1. $V$ samples $\text{seed} \leftarrow \{0, 1\}^\lambda$, and then $V$ and $P$ call the (Commit) command of $F_{\text{con}}$ with input $\text{seed}$, which returns a handle $\tau_1$ to $P$.
2. $V$ samples $\Lambda \leftarrow F$ and runs $\langle A^i \rangle \leftarrow \text{Enc}(\text{sk}, A^i; r_i)$ for all $i \in [1, k]$ where $(r_0, r_1, \ldots, r_k) = G(\text{seed})$ and $\text{sk} \leftarrow \text{KeyGen(\text{par}; r_0)}$. Then, $V$ sends the AHE ciphertexts $\langle A^1 \rangle, \ldots, \langle A^k \rangle$ to $P$.

**Pre-Gen.** On input $f$,

3. $P$ and $V$ sends (extend) to $F_{\text{VOLE}}$, which returns $u, v$ to $P$ and $v$ to $V$, such that $w = v + u \cdot \Delta$.
4. On input polynomial $f(\cdot) = \sum_{i=0}^{k} f_i \cdot X^i \in F[X]$, $P$ computes a ciphertext $\langle b \rangle$ with $u + b = f(A)$ via $(\langle b \rangle) = \sum_{i=1}^{k} f_i \cdot \langle A^i \rangle + f_0 - u$.
5. $P$ and $V$ call the (Commit) command of $F_{\text{con}}$ with inputs $\langle b \rangle$, which returns a handle $\tau_2$ to $V$.

**Gen.** On input $(\tau_1, \tau_2)$,

6. $V$ and $P$ call the (Open) command of $F_{\text{con}}$ on input $\tau_1$, which returns $(\text{seed}, \tau_1)$ to $P$. In parallel, $V$ sends $\Lambda$ to $P$. Then, $P$ computes $(r_0, r_1, \ldots, r_k) := G(\text{seed})$ and runs $\text{sk} \leftarrow \text{KeyGen(\text{par}; r_0)}$. $P$ checks that $\langle A^i \rangle = \text{Enc}(\text{sk}, A^i; r_i)$ for all $i \in [1, k]$, and aborts if the check fails. $P$ sets $M := w$.
7. $P$ and $V$ call the (Open) command of $F_{\text{con}}$ on input $\tau_2$, which returns $(\langle b_1 \rangle, \ldots, \langle b_f \rangle, \tau_2)$ to $V$. Then, $V$ runs $b \leftarrow \text{Dec}(\text{sk}, \langle b \rangle)$, and then computes $K := v + b \cdot A \in F$.
8. Two parties obtain an IT-PAC $[f(\cdot)]$, where $P$ holds $(f(\cdot), M)$ and $V$ holds $K$.

Fig. 7: Protocol for generating IT-PACs without ZK proofs in the $(F_{\text{VOLE}}, F_{\text{con}})$-hybrid model.

Microsoft SEAL [46] and other cryptographic building blocks are from EMP-toolkits [48]. Two Amazon EC2 m5.8xlarge instances located in the same region are running as $P$ and $V$. We manually throttle the network to simulate low-bandwidth settings. We use the same 59-bit FFT-friendly field as the AntMan [50]. The performance of AntMan++ is not affected by the circuit structure and we benchmark with layered circuits for convenience. In all experiments, we randomly sample a circuit with $2^{16}$ input wires, $2^{27}$ addition gates and $2^{27}$ multiplication gates distributed at $2^{12}$ layers. We compare AntMan++ with the prior general VOLE-ZK Quicksilver [52] and use its default parameter setting.
For each gate $P$ in $C$, two parties hold IT-PAC of input wire vectors $[f]$ and $[g]$:
- If $T = ADD$, both parties locally compute output IT-PAC $[h] = [f] + [g]$.
- If $T = MULT$, $P$ computes a degree-$(2B - 2)$ polynomial $h(\cdot) := f(\cdot) \cdot g(\cdot) \in \mathbb{F}[X]$ and a degree-$(B - 1)$ polynomial $H(\cdot)$ such that $h(\alpha_i) = h(\alpha_i)$ for all $i \in [B]$. Then, $P$ and $V$ run sub-protocol $P_{\text{PAC}}^{(2B-2)}$ to generate two IT-PACs $[h(\cdot)]$ and $[h(\cdot)]$.

As there are $n_2$ multiplication gates, the commitments of their outputs are denoted as $[h_1], \ldots, [h_{n_2}]$. Consequently, their degree-$(2B - 2)$ polynomials are denoted as $[\tilde{h}_1], \ldots, [\tilde{h}_{n_2}]$.

2. $P$ samples two random polynomials $r(\cdot)$ and $s(\cdot)$ of respective degrees $B - 1$ and $2B - 2$ in $\mathbb{F}[X]$ such that $r(\alpha_i) = s(\alpha_i)$ for $i \in [1, t]$. Then, $P$ and $V$ generate the corresponding IT-PACs $[r(\cdot)]$ and $[s(\cdot)]$.

3. $V$ samples $\text{seed} \leftarrow \{0, 1\}^\lambda$ and sends it to $P$. Then, two parties compute $(\chi_1, \ldots, \chi_{n_2}) := \text{Hash}([\text{seed}]) \in \mathbb{F}^{n_2}$.

4. $P$ and $V$ locally compute $\tilde{h}(\cdot) := \sum_{j=1}^{n_2} \chi_j \cdot [h_j(\cdot)]$ and $[\tilde{h}(\cdot)] := \sum_{j=1}^{n_2} \chi_j \cdot [h_j(\cdot)] + [s(\cdot)]$. Then, $P$ sends the polynomial pair $(\tilde{h}(\cdot), \tilde{h}(\cdot))$ to $V$, who checks that $h(\cdot), \tilde{h}(\cdot)$ have the degrees $B - 1$ and $2B - 2$ respectively and $h(\alpha_i) = \tilde{h}(\alpha_i)$ for all $i \in [1, t]$.

5. $P$ and $V$ run $\text{Gen}(\tau_1, \tau_2)$ to open $A$ to $P$, and then $V$ can compute the local keys on all IT-PACs.

6. $P$ and $V$ run a VOLE-based zero-knowledge proof $\text{DVZK} \left\{ ([f_j(A)], [g_j(A)], [\tilde{h}_j(A)]) \mid j \in [n_2], \tilde{h}_j(A) = f_j(A) \cdot g_j(A) \right\}$.

7. $P$ and $V$ locally compute $[\mu] := [h(A)] - h(A)$ and $[\nu] := [\tilde{h}(A)] - \tilde{h}(A)$. Then, two parties run $\text{Open}$ to check that $\mu = 0$ and $\nu = 0$.

8. Let $[\nu(\cdot)]$ be the IT-PAC associated with the output values circuit $C$. $P$ and $V$ run $\text{Open}$ to check $\nu(A) = 0$.

If any check fails, $V$ aborts.

Fig. 8: The protocol of SIMDZK from AntMan (Cont.).

in [48]. We do not compare with AntMan [50] because it only proves SIMD circuits.

We first benchmark the running time and communication overhead with variable batch size $\log_2 B \in [9, 12]$. AntMan++ is split into the input-independent setup phase and online phase, and their performance are reported separately. As shown in Table 1, the increase of $B$ leads to the significant reducing of the online communication overhead. The setup communication is dominated by HE ciphertexts and rotation keys. For the security of HE, the ciphertext size is fixed for all $\log_2 B \leq 11$ and start to increase when $B \geq 12$. The running time for both setup and online phases increase with $B$. The overhead mainly comes from the ciphertext rotation during the setup phase as well as the HE evaluation and polynomial multiplication during the online phase. Although its running time
<table>
<thead>
<tr>
<th>( \log_2 B )</th>
<th>Communication (MB) Setup</th>
<th>Online</th>
<th>Running time (s) Setup</th>
<th>Online</th>
</tr>
</thead>
<tbody>
<tr>
<td>9</td>
<td>4.6</td>
<td>60.13</td>
<td>6.84</td>
<td>377.3</td>
</tr>
<tr>
<td>10</td>
<td>4.6</td>
<td>30.54</td>
<td>14.7</td>
<td>380.7</td>
</tr>
<tr>
<td>11</td>
<td>4.6</td>
<td>15.78</td>
<td>38.72</td>
<td>407.83</td>
</tr>
<tr>
<td>12</td>
<td>6.7</td>
<td>8.82</td>
<td>144.75</td>
<td>438.19</td>
</tr>
<tr>
<td>QS [52]</td>
<td>1087.23</td>
<td></td>
<td>915.43</td>
<td></td>
</tr>
</tbody>
</table>

Table 1: **Performance of AntMan++ with variable batch size.** Benchmarked with 1 thread, 50 Mbps bandwidth and circuit size \( C = 2^{27} \).

<table>
<thead>
<tr>
<th>Scheme-Threads</th>
<th>Bandwidth (Mbps)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>10</td>
</tr>
<tr>
<td>AM-1</td>
<td>461.71</td>
</tr>
<tr>
<td>AM-4</td>
<td>292.61</td>
</tr>
<tr>
<td>AM-8</td>
<td>263.55</td>
</tr>
<tr>
<td>QS-8 [52]</td>
<td>900.47</td>
</tr>
</tbody>
</table>

Table 2: **Performance of AntMan++ with variable threads and bandwidth.** Benchmarked with circuit size \( C = 2^{27} \) and batch size \( B = 2^{11} \). Numbers are in seconds.

is \( 2.1 \times \sim 3.2 \times \) longer than Quicksilver, the bandwidth usage is \( 17.5 \times \sim 83.6 \times \) smaller.

Then we show the running time with the variable network bandwidth and the number of threads (Table 2). The batch size is fixed to be \( B = 2^{11} \). AntMan++ is highly efficient in terms of network communication with asymptotically \( \mathcal{O}(C/B) \) overhead. Its running time does not significantly deteriorate with the decreasing of bandwidth. On the another hand, AntMan++ is computationally heavy but fully parallelizable, thus multi-threading is effective on increasing its throughput. When the number of threading is increased from 1 to 4, the running time is decreased by \( 36\% \sim 38\% \). Compared to Quicksilver, it requires \( 70\% \) less running time when bandwidth is 10Mbps and \( 30\% \) less when bandwidth is 25Mbps.

### 4.2 Constant-round SIMD-ZK in the Discrete Logarithm setting

To showcase the versatility of our framework, we present an SHVZK argument of knowledge for \((B, C)\)- SIMD circuits, and compile it to a constant-round square-root size argument for general circuit. This construction is based on the work of Attema–Cramer [3]. We note that [3] also mention (in a remark) that their techniques yield a constant-round sublinear argument; however, our approach achieves better parameters. Our SIMD-ZK has a better CRS size and computation complexity by taking advantage of our compiler and the approach of dividing circuit into smaller ones, i.e our CRS size is \( \mathcal{O}(B) \) instead of \( \mathcal{O}(|C|) \), our dominant computation cost is \( \mathcal{O}(|C|/B) \) interpolations of polynomials of degree \( \mathcal{O}(B) \) while it is \( \mathcal{O}(1) \) interpolations of polynomials of degree \( \mathcal{O}(|C|) \) in [3].
Moreover, for a special type of circuit, i.e for \((B, C)\)-SIMD circuit, our protocol has only \(O(C + \sqrt{B})\) communication. We provide a more in-depth comparison in Supplementary Material A.3.

**Sublinear ZK Argument for Linear Form Evaluation.** Given a linear form \(L : \mathbb{Z}_p^n \rightarrow \mathbb{Z}_p\), consider the relation

\[
\mathcal{R} = \{(P \in \mathbb{G}, L, y \in \mathbb{Z}_p; x \in \mathbb{Z}_p^n, r \in \mathbb{Z}_p) : P = g^x h^r, y = L(x)\}
\]

Let \(P \in \mathbb{G}\) be a commitment to \(x \in \mathbb{Z}_p^n\). The prover wants to convince the verifier that \(y = L(x)\), while keeping \(x\) secret. To achieve this, \([3]\) divides the witness vector \(x \in \mathbb{Z}_p^n\) into 2 parts then recursively run the protocol \(\log(n) - 1\) times and get an honest-verifier proof of knowledge for relation \(\mathcal{R}\) with \(O(\log n)\) bits communication in \(O(\log n)\) moves. It is known \([30]\) that there is a trade-off between communication cost and the number of rounds, and this protocol can be generalized by dividing the witness into \(k = O(\sqrt{n})\) parts, yielding a 5-round protocol with sublinear communication (this was also mentioned in \([3]\)). We denote the 5-round sublinear ZK argument for linear form evaluation by \(\Pi([x], L_1, 0; x)\). We provide a formal description of this protocol in the Supplementary Material A.3.

**Amortization.** We now overview some amortizations techniques of \([3]\) which allow the prover to prove correctness of 1) many nullity checks of different linear forms over the same committed vector \(x\) and 2) many evaluations of the same linear form over many different committed vectors in the 5-move protocols, with negligible communication overhead compared to a single check \(\Pi\).

**Compressing many nullity checks \(\Pi_{zeros}([x], L_1, L_2, \ldots, L_s, 0; x)\):** Given \(P = g^x h^r, s\) linear functions \(L_i : \mathbb{Z}_p^n \rightarrow \mathbb{Z}_p\), \(\mathcal{P}\) can show that \(L_i(x) = 0\) for all \(i \in [1, s]\) at the cost of one single check plus one \(\mathbb{Z}_p\) challenge from \(\mathcal{V}\) to \(\mathcal{P}\).

**Amortizing over many commitments \(\Pi_{Am}([x], L, y_i; x_i)_{i \in [1, s]}\):** Given \(P_i = g^{x_i} h_i^r\), for \(i \in [1, s]\), the prover wants to show that the evaluation of the same linear form \(L\) on many committed vectors is correct i.e \(y_i = L(x_i)\). Intuitively, a prover can do this batch evaluation checks of \(L\) over many committed vectors \(x_i\) at the same cost of evaluation checks of \(L\) over only one committed vector.

**Batch argument for multiplication gates.** Let’s consider \(m\) tuples of \(B\) multiplications \((x_{j,i}, y_{j,i}, z_{j,i} = x_{j,i} y_{j,i})_{i \in [1, B]}\) for each \(j \in [1, m]\). The batch argument is based on algebraic interpolation polynomial and the ZK proof \(\Pi\) which proves the correct evaluation of linear form (consider \(\Pi\) as a black box).

- For \(j \in [1, m]\), \(\mathcal{P}\) defines 2 random polynomials \(f_j, g_j\) of degree at most \(B\) such that \(f_j(i) = x_{j,i}\) and \(g_j(i) = y_{j,i}\) for all \(i \in [1, B]\). By Lagrange-interpolation \(f_j, g_j\) are well-defined from \(x_j := (f_j(0), x_{j,1}, x_{j,2}, \ldots, x_{j,B}) \in \mathbb{Z}_p^{B+1}\) and \(y_j := (g_j(0), y_{j,1}, y_{j,2}, \ldots, y_{j,B}) \in \mathbb{Z}_p^{B+1}\). Define \(h_j := f_j g_j\), observe that degree of \(h_j\) is at most \(2B\), \(h_j(i) = z_{j,i}\) for \(i \in [1, B]\) and \(h_j\) is well-defined from \(z_j := (h(0), z_{j,1}, z_{j,2}, \ldots, z_{j,B}, h(B+1), \ldots, h(2B)) \in \mathbb{Z}_p^{2B+1}\). \(\mathcal{P}\) commits \((x_j, y_j, z_j)_{i \in [1, m]}\).
- \(\mathcal{V}\) pick randomly \(c \xrightarrow{\$} \mathbb{Z}_p \setminus [1, B]\) and sends to prover.
The size of CRS is \( O(\sqrt{|C|}) \) random elements of \( \mathbb{G} \).

- \( \mathcal{P} \) reveals \((f_j(c), g_j(c), h_j(c))\) for all \( j \in [1, m] \). \( \mathcal{V} \) now checks \( h_j(c) = f_j(c)g_j(c) \).
- \( \mathcal{P} \) can cheat with probability at most \((2B)/(p-B)\). Denote \( L_c : \mathbb{Z}_p^{B+1} \rightarrow \mathbb{Z}_p, L'_c : \mathbb{Z}_p^{2B+1} \rightarrow \mathbb{Z}_p \) are public linear forms by Lagrange formula such that \( L_c(x_j) = f_j(c), L_c(y_j) = g_j(c) \) and \( L'_c(z_j) = h_j(c) \) the corresponding to the same linear form.
- \( \mathcal{P} \) runs in parallel \( \Pi_{Am}(\{x_j\}, \{y_j\}, L_c, f_j(c), g_j(c); x_j, y_j)_{j \in [1, m]} \) and \( \Pi_{Am}(\{z_j\}, L'_c, h_j(c); z_j)_{j \in [1, m]} \).

We obtain an argument for \( n \) multiplications with sublinear communication cost when choosing \( B = O(\sqrt{n}) \).

**Instantiation of SIMD ZK.** We achieve a ZK proof for \((B,C)\)-SIMD circuits with communication \( O(|C| + \sqrt{B}) \). Concretely, sending commitments of gates costs at most \( 3|C| \) elements of \( \mathbb{G} \). For checking the correction of multiplication gates and consistency of output gates, the cost is less than 3 times the instantiation of \( \Pi \) over the committed vector of length \( 2B \) which has \( O(\sqrt{B}) \) communication.

**Sublinear Circuit Satisfiability.** Since Pedersen’s commitment is homomorphic, additions are free in our system, there are two constraints needed to prove 1) that multiplication gates are correctly computed and 2) the consistency of wires between layers. The former constraints is handled by the batch multiplication argument and the latter is proven using our compiler (section 3). Note here, we apply our compiler to prove in ZK the consistency of wires between layers (this is essentially a proof of a linear map) and combine the high-level intuition underlying the analysis of our compiler with the analysis of Attema et al. to obtain a direct security proof. Concretely, we carefully combine two works ([2] and [28]) to obtain an SHVZK for SIMD circuit.

These two proofs use different commitments for two vectors which present for the same tuple of output wire values of multiplication gates so then it requires to prove the consistency. Specifically, for \( j \) group of multiplication gates, we have two commitments \( \langle o'_j \rangle, \langle o_j \rangle \) which committed of two vectors \( o'_j \) and \( o_j \). While \( o'_j := (r, o_{1,j}, o_{2,j}, \ldots, o_{B,j}) \in \mathbb{Z}_p^{B+1} \) and \( o_j := (h_j(0), o_{1,j}, o_{2,j}, \ldots, o_{B,j}, h_j(B+1), \ldots, h_j(2B)) \in \mathbb{Z}_p^{2B+1} \) where \( r \in \mathbb{Z}_p \). Prover therefore needs to prove that \( \langle o'_j \rangle/\langle o_j \rangle \) is the commitment of vector which having the power 0s of the set of generators \( \{g_2, g_3, \ldots, g_{B+1}\} \). By the method described earlier, this is handled by a \( \Pi_{Am} \) for checking many nullities. The protocol \( \Pi_{ZKPed} \) of our sub linear ZK is shown in the Figure 14 of Supplementary Material.

**Theorem 3.** There is a sublinear argument of knowledge for circuit satisfiability in constant-round with the following properties:

- Perfect completeness, computational special soundness, and special HVZK under the discrete logarithm assumption.
- The number of rounds is 7.
- The size of CRS is \( O(\sqrt{|C|}) \) random elements of \( \mathbb{G} \).
- Computation is dominated by \( O(\sqrt{|C|}) \) interpolations of polynomial of degree \( O(\sqrt{|C|}) \).
Correctness check of multiplication gates
If the group contains addition gates then

Otherwise, if the group contains multiplication gates
Consistency check of output gates

Note that in our sublinear ZK based on DLOG setting, we do not directly derive the result from the generic UC proof of security of the abstract compiler, and in particular, do not achieve UC security. This would require the commitments to be extractable. The proof of the consistency of wires follows our compiler, but there is some extra work needed to prove the consistency of commitments (described above). As for the security analysis, the analysis of our ZK based on DLOG is not directly inherited from the real-ideal security proof of instantiation of eSIMDZK,
but rather follows directly from the security analysis of the two works [3] and [30]. Note that we can define $[x]$ in functionality 2 being Perdersen commitment of $x$, but in DLOG-setting, we never need to extract the commitments, and the proofs of soundness and ZK are not in the UC model.

4.3 Multi-Verifier ZK

Yang et al. propose a non-interactive designated multi-verifier ZK (MVZK) proof [53]. In this protocol, a prover $P$ validates a statement to $n$ verifiers ($V_1, \ldots, V_n$) with its private input. Verifiers are assumed honest-majority with adversary threshold $t < n(1/2 - \varepsilon)$ for $0 < \varepsilon < 1/2$. $P$ distributes packed Shamir secret shares (PSS) of all batched circuit wire values to ($V_1, \ldots, V_n$), who jointly execute a distributed ZK to verify the correctness of the circuit evaluation [29,15,12], with the assistance of $P$. A special coin tossing protocol is designed to maintain non-interactiveness. This MVZK protocol can be viewed as a commit-and-prove ZK, in which the circuit wire values are committed by the PSS. The hiding property is ensured by the privacy property of PSS, for which a collusion of $\leq t$ parties can not reconstruct the secret values. The binding property holds by the fact that any $(n - t) > t + 1$ honest parties’ shares define the secret values.

In addition to proving the satisfiability of SIMD circuits, [53] also proposes a protocol for the check of wiring consistency, which enables the PSS-based MVZK to work for general circuits. Define a packing parameter $B$, for any indices $i, j \in [B]$ and PSS $[w_1], [w_2]$, the protocol proves that $w_1[i] = w_2[j]$. Overall, the checking procedure incurs communication complexity $O(n^2B^2)$. Our compiler reduces it to $O(n^2)$. In Supplementary Material A.4, we introduce an inner product verification protocol for the check of multiplication gates, and its functionality is denoted as $F_{\text{verifyprod}}$.

**SIMD-ZK from [53]**. The protocol is shown in Figure 10. A commitment in MVZK is a PSS for a vector of $B$ values. The opening of a commitment is done by each verifier sending its PSS share to all other verifiers, followed by all of them validating the shares and decoding the committed values. The proving procedure takes the input of commitments to batch circuit input wires and output wires of all multiplication gates. They are precomputed by $P$ and distributed to verifiers via PSS. At Step 1, verifiers locally arrange $([w_0], [w_3], [w_9])$ for the indices of all batch multiplication triples $(\alpha, \beta, \gamma)$. The PSS for the input wires of batch multiplication gates are obtained locally by linearly combining the PSS for previous batches. Next, parties invoke a Fiat-Shamir procedure at Step 2 to sample a random coin $\chi \in K$. The property of non-interactiveness forbids the verifiers from sending messages to $P$. In this protocol, $P$ computes the input to the Fiat-Shamir transformation from the shares of parties and let verifiers verify their correctness, namely, $(\text{com}_1, \ldots, \text{com}_n)$. In this way, verifiers are able to compute $\chi$ by hashing these commitments. In the end, parties convert the multiplication triples to an inner product triple, which is verified by $F_{\text{verifyprod}}$. 
**Protocol **$\Pi_{MVZK}$

**Setup:** Define the number of verifiers $n$, packing parameter $B$ and adversary threshold $t < n(1/2 - \epsilon)$ for $0 < \epsilon < 1/2$, which satisfy $t + 2B - 1 \leq n$. $\alpha_1, \ldots, \alpha_n, \gamma_1, \ldots, \gamma_B \in F$ are $n + B$ evaluation points. Let $H_1 : \{0, 1\}^* \rightarrow \{0, 1\}^n$ and $H_2 : \{0, 1\}^* \rightarrow K$ be two random oracles.

**Commit:** On input degree $d$ and a vector $w \in \mathbb{F}^B$, $\mathcal{P}$ uniformly samples $r \in \mathbb{F}^{d-B+1}$ and interpolates a degree-$d$ polynomial $f(\cdot)$ such that $f(\gamma_i) = w_i$ for $i \in [B]$ and $f(\alpha_i) = r_i$ for $i \in [d - B + 1]$. The commitment is defined as $[w] = \{f(\alpha_i)\}_{i=1}^n$, in which $f(i)$ is held by $\mathcal{V}_i$.

The communication cost can be reduced from $n|F|$ to $(n - d + B - 1)|F|$ if for $i \in [d - B + 1]$, $\mathcal{P}$ and $\mathcal{V}_i$ hold a shared PRG $\mathcal{PRG}_i$. The share can be computed locally as $f(\alpha_i) \leftarrow \mathcal{PRG}_i$.Next()

**Open:** On input degree $d$, and commitment $[w]$, each verifier in ($\mathcal{V}_1, \ldots, \mathcal{V}_n$) sends its share to all other verifiers. They reconstruct the degree-$d$ polynomial $f(\cdot)$ and derive $w = (f(\gamma_1), \ldots, f(\gamma_B))$. If the reconstruction fails, the verifier aborts.

**Prove:** On input the commitment to circuit input wires and output wires of multiplications ($[w_1], \ldots, [w_m]$),

1. $\mathcal{P}$ and $(\mathcal{V}_1, \ldots, \mathcal{V}_n)$ scan the circuit in topological order. For each batch of multiplication gates $(\alpha, \beta, \gamma)$ for which they already have $[w_i]$, compute $([w_i], [w_j])$ which are linear combinations of commitments for previous wires.

2. Denote $\tilde{w}^i$ as the share that $\mathcal{V}_i$ holds for a PSS $[w]$. For $i \in [n]$, $\mathcal{P}$ samples $r_i \leftarrow \{0, 1\}^*$, sends $r_i$ to $\mathcal{V}_i$ and computes $com_i = H_1(\tilde{w}_1, \ldots, \tilde{w}_m, r_i)$. $\mathcal{P}$ broadcasts $(com_1, \ldots, com_n)$ to all verifiers. Each verifier $\mathcal{V}_i$ checks whether $com_i$ is consistent with its local shares and $r_i$. It aborts if the check fails. Otherwise, $\mathcal{P}$ and all verifiers compute $\chi := H_2(com_1, \ldots, com_n)$.

3. Denote $\{[t], [r], [o]\}_{i \in [n]}$ to be commitments of all left, right and output wires of batched multiplication gates. Construct $[\tilde{x}] := \chi^{-1} \cdot [t]$, $[\tilde{y}] := [r]$ for $i \in [n]$ and $[\tilde{z}] := \sum_i [z_i] \cdot [o]$. Invoke $\mathcal{F}_{verifyprod}$ with input $(([\tilde{x}], [\tilde{y}], [\tilde{z}])$ and parties output reject if $\mathcal{F}_{verifyprod}$ outputs reject.

---

**Fig. 10:** The protocol of SIMDZK from designated multi-verifier ZK.

## 5 Acknowledgement

Work of Kang Yang is supported by the National Key Research and Development Program of China (Grant No. 2022YFB2702000), and by the National Natural Science Foundation of China (Grant Nos. 62102037, 61932019). Work of Xiao Wang is supported by DARPA under Contract No. HR001120C0087, NSF award #2016240, #236819, #2318975, #2310927 and research awards from Meta and Google. Work of Geoffroy Couteau is supported by the French Agence Nationale de la Recherche (ANR), under grant ANR-20-CE39-0001 (project SCENE) and the France 2030 ANR Project ANR-22-PECY-003 SecureCompute. Work of
Dung Bui is supported by Dim Math Innov funding from the Paris Mathematical Sciences Foundation (FSMP) funded by the Paris Ile-de-France Region. Work of Yu Yu is supported by the National Natural Science Foundation of China (Grant Nos. 62125204 and 61872236). Yu Yus work has also been supported by the New Cornerstone Science Foundation through the XPLORER PRIZE. The views, opinions, and/or findings expressed are those of the author(s) and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

References


31


46. Microsoft SEAL (release 4.0). https://github.com/Microsoft/SEAL (Mar 2022), microsoft Research, Redmond, WA.


A Supplementary Material

A.1 Additional Preliminaries

The following definitions are provided for the instantiation of SHVZK argument of knowledge.

Definition 1 (Generalized Pedersen Commitment [42]). Given an Abelian group \(G\) of prime order \(p\). The Pedersen commitment works as follows:

- Set up a commitment key \(ck = (g, h) = \{g_1, g_2, \ldots, g_n, h\} \leftarrow \mathbb{G}^{n+1}\).
- Commitment of a vector \(x \in \mathbb{Z}_p^n\) is defined by \(com_{ck}(x, r) = h^r g^x = h^r \prod_{i=1}^n g_i^x\) where \(r \leftarrow \mathbb{Z}_p\).

Pedersen commitment is perfectly hiding, computationally binding under the DDH assumption. It satisfies homomorphic property, for all \(x_1, x_2 \in \mathbb{Z}_p^n\), \(r_1, r_2 \leftarrow \mathbb{Z}_p\):

\[
com_{ck}(x_1, r_1).com_{ck}(x_2, r_2) = com_{ck}(x_1 + x_2, r_1 + r_2)
\]

Let \(\mathcal{R}\) be an efficiently decidable binary relation for an \(\mathbf{NP}\) language \(\mathcal{L}\). If \(x \in \mathcal{L}\) and \((x, w) \in \mathcal{R}\) then \(x\) is a statement and \(w\) is a witness. An interactive argument for \(\mathcal{R}\) is a tuple of three probabilistic polynomial time interactive algorithms \(\Pi = (\text{Gen}, P, V)\) called the common reference string generator, the prover and the verifier, with the following properties:

- \(\text{crs} \leftarrow \text{Gen}(1^\lambda)\). On input \(1^\lambda\) generates public parameters \(\text{par}\) (such as group parameters), a \(\text{crs}\). For simplicity of notation, we assume that any group parameters are implicitly included in the \(\text{crs}\).
- We write \(tr \leftarrow \langle P(x), V(y) \rangle\) for the public transcript produced by \(P\) and \(V\) when interacting on inputs \(x\) and \(y\). This transcript ends with \(V\) either accepting or rejecting. We sometimes shorten the notation by saying \(\langle P(x), V(y) \rangle = b\), where \(b = 0\) corresponds to \(V\) rejecting and \(b = 1\) corresponds to \(V\) accepting.

Definition 2 (Perfect completeness). A proof system \(\Pi = (\text{Gen}, P, V)\) for \(\mathcal{R}\) is perfectly complete, if

\[
\Pr \left[ (P(\text{crs}, x), V(\text{crs}, x)) = 1 \left| \text{crs} \leftarrow \text{Gen}(1^\lambda), (x, w) \in \mathcal{R} \right. \right] = 1
\]

Definition 3 (Computationally soundness). A proof system \(\Pi = (\text{Gen}, P, V)\) is computationally sound if for every efficient adversary \(A\)

\[
\Pr \left[ \langle A, V(\text{crs}, x) \rangle = 1 \left| x \notin \mathcal{L}, \text{crs} \leftarrow \text{Gen}(1^\lambda) \right. \right] = \text{negl}(\lambda)
\]
An argument $\Pi = (\text{Gen}, P, V)$ is public coin if the verifiers messages are chosen uniformly at random independently of the messages sent by the prover, i.e the challenges correspond to the verifiers randomness $\rho$. If there is a polynomial time algorithm that, given a statement $x$ and a $(k_1, \ldots, k_\mu)$-tree of accepted transcripts, produces a witness $w$ for $x$, then the public coin protocol is said to be (unconditionally) $(k_1 \ldots, k_\mu)$-special sound. Under the DL assumption, we state that the protocol is computationally special sound if there exists an efficient algorithm that either extracts a witness or finds a non-trivial DL relation between $g_1, \ldots, g_n, h$.

**Definition 4 (Special honest-verifier zero-knowledge (SHVZK)).** A public coin argument $\Pi$ is a SHVZK if if there exists a probabilistic polynomial time simulator $S$ such that for all non-uniform polynomial time adversaries $A$ we have

$\Pr \left[ A(tr) = 1 \mid (x, w) \in R \right] \approx \Pr \left[ A(tr) = 1 \mid (x, w) \in R \right]$

\[
\approx \Pr \left[ A(tr) = 1 \mid (x, w, \rho) \leftarrow \text{Gen}(1^\lambda) \right. \right. \\
\left. \left. (x, w) \in R \mid (x, w, \rho) \leftarrow A(\text{crs}, x, w) \right) \right] \\
\approx \Pr \left[ A(tr) = 1 \mid (x, w, \rho) \leftarrow \text{Gen}(1^\lambda) \right. \right. \\
\left. \left. (x, w) \in R \mid (x, w, \rho) \leftarrow A(\text{crs}, x, w) \right) \right]
\]

where $\rho$ is the public coin randomness used by the verifier.

The following description of functionalities are provided for the instantiation of AntMan++.

**Vector oblivious linear evaluation $F_{\text{VOLE}}$.** This functionality works over a field $\mathbb{F}$, and upon receiving (init) from $P$ and $V$, if $V$ is honest, then sample $\Delta \leftarrow \mathbb{F}$, else receive $\Delta \in \mathbb{F}$ from the adversary. Store $\Delta$ and ignore all subsequent (init) commands. Upon receiving (extend, $n$) from $P$ and $V$, execute:

- If $V$ is honest, sample $v \leftarrow \mathbb{F}^n$. Otherwise, receive $v \in \mathbb{F}^n$ from the adversary.
- If $P$ is honest, sample $u \leftarrow \mathbb{F}^n$ and compute $w := v + u \cdot \Delta \in \mathbb{F}^n$. Otherwise, receive $u \in \mathbb{F}^n$ and $w \in \mathbb{F}^n$ from the adversary, and then recompute $v := w - u \cdot \Delta \in \mathbb{F}^n$.
- Output $(u, w)$ to $P$ and $v$ to $V$.

**Commitment $F_{\text{Com}}$.** Similar to the functionality of Commit command in $F_{\text{SIMDZK}}$:

- Upon receiving input (Commit, $w$) from $P$ and (Commit) from $V$, pick a tag $\|w\|$ and store $(\|w\|, w)$ in the memory. Return $\|w\|$ to both parties.
- Upon receiving (Open, $\|w\|$), if a tuple $(\|w\|, w)$ was previously stored, output $(\|w\|, w)$ to $V$; otherwise abort.

### A.2 AntMan

We discuss how to enable AntMan to prove arbitrary circuits.

**Compiling AntMan.** The AntMan protocol that realizes $F_{\text{SIMDZK}}$ is shown in Figure 6 (derived from the original AntMan [50]). Below we show how to compile
it into a ZK protocol that proves the satisfiability of a single general circuit with sublinear communication, following the protocol of our compiler $Π_{\text{compiler}}$.

Given a public circuit $C$, both parties scan the circuit following the procedure in figure 4 to calculate wire rearrangement matrices $L, R, O$, as well as matrix $A$ that describes addition gates. Then $P$ splits wire values $(l, r, o, w)$ into chunks of size $B$, and calls $\text{Commit}$ to obtain $\{[l_i], [r_i], [o_i], [w_i]\}_{i\in[n_2]}$ and $\{[w_i]\}_{i\in[k]}$. To prove $l = Lw$:

1. $P$ uniformly samples a vector $\vec{r} \in \mathbb{F}^B$ such that $\sum_{i=1}^{B} \vec{r}[i] = 0$, and calls $\text{Commit}$ to obtain $\{\vec{r}\}$.
2. $V$ uniformly samples a vector $\vec{r} \in \mathbb{F}^{Bn_2}$ and sends it to $P$. Both parties compute $\hat{v} = \vec{r}^tL \in \mathbb{F}^{Bk}$ and call $\text{Commit}$ to obtain $\{[v_i]\}_{i\in[k]}$ and $\{[\vec{r}_i]\}_{i\in[n_2]}$.
3. $P$ computes $\vec{q} \in \mathbb{F}^B$, such that

   \[ q[i] = \sum_{j=1}^{n_2} \vec{r}_j[i]l_j[i] - \sum_{j=1}^{k} v_j[i]w_j[i] + \vec{r}[i]. \]

   $P$ calls $\text{Commit}$ to obtain $\{\vec{q}\}$.
4. Define circuit $C_{\text{lin}}(a_1, \ldots, a_{n_2}, b_1, \ldots, b_{n_2}, c_1, \ldots, c_k, d_1, \ldots, d_k, e, f)$

   \[ := \sum_{i=1}^{n_2} a_i \cdot b_i - \sum_{i=1}^{k} c_i \cdot d_i + e - f, \]

   then call $\text{Prove}(C_{\text{lin}}, [\vec{r}_1], \ldots, [\vec{r}_{n_2}], [l_1], \ldots, [l_{n_2}], [v_1], \ldots, [v_k], [w_1], \ldots, [w_k], [\vec{r}], [\vec{q}])$.

In the same way, check that $r = Rw, o = Ow$, and that $0 = Aw$. After wire consistency check, we check the correctness of multiplication gates. Define circuit $C_{\text{mult}}([x_1, y_1, z_1]_{i\in[n_2]}):= \bigvee_{i\in[n_2]} (x_iy_i - z_i)$, and call $\text{Prove}(C_{\text{mult}}, ([l_1], [r_1], [o_1]],[w_1], \ldots, [w_k],[\vec{r}], [\vec{q}])$.

Then, $V$ calls (Open, $\{\vec{q}\}$) and checks $\sum_{i=1}^{B} q[i] = 0$. $V$ aborts if any check fails.

### A.3 Compressed Sigma protocol of Attema–Cramer

**Sublinear ZK Argument for Linear Form Evaluation.**

Without the loss of generality, for a dimension $m$ and a vector $g \in G^m$, given $k|m$ where $k$ is the number of parts to divide $g$ into, if this is not the case the vector $g$ can be appended with zeros, then $g := g_1, g_2, \ldots, g_k \in \mathbb{Z}_p^m$ and $g_i \in \mathbb{Z}_p^{m/k}$ for $i \in [1, k]$. Given a linear form $L : \mathbb{Z}_p^m \rightarrow \mathbb{Z}_p$, let us define $k$ sub-linear forms $L_i : \mathbb{Z}_p^{m/k} \rightarrow \mathbb{Z}_p$ of $L$ as $x \rightarrow L(x')$ where vector $x' \in \mathbb{Z}_p^m$ such that block $i$ of $x'$ equals to $x$ and other $k-1$ blocks equal to 0s. As a result, $L(x) = \sum_{i=1}^{k} L_i(x_i)$. The notation $[x]$ is a Pedersen commitment of vector $x$.

**Theorem 4.** $Π$ (Figure 11) is a 5-move protocol for relation $R$. This argument is perfect completeness, special HVZK, and computationally special soundness, under the discrete logarithm assumption. The total communication costs include:
Protocol II([x], L, y; x)

Public parameters. \((g, h, k) \in \mathbb{G}^{n+2}, L : \mathbb{Z}_p^n \rightarrow \mathbb{Z}_p, P = g^s h^r, y = L(x)\).

Protocol.

1. \(P \rightarrow V\). Prover picks randomly \(\alpha \in \mathbb{Z}_p^*, \beta \in \mathbb{Z}_p\). Prover sends \(t = L(\alpha), A = g^s h^\beta\) to verifier.

2. \(V \rightarrow P\). Verifier chosen randomly \(c_0, c_1 \in \mathbb{Z}_p\) and sends them to prover.

   Prover defines \(z = cx + \alpha\), \(\gamma = c_0r + \beta\) and \(\tilde{z} := (z, \gamma)\).

   Define \(\tilde{g} := (g, h) \in \mathbb{G}^{n+1}; Q := AP^{c_0}k^{c_1}(c_0y + t)\) and \(L(z, \gamma) := c_1L(z)\). Note that \(Q := \tilde{g}^k\).

3. \(P \rightarrow V\). For \(l \in [0, 2k - 2]\), prover computes:

   \[
   m_l := \prod_{i,j,l=k+i-j-1} g^{\hat{z}_i}k^L(z)
   \]

   Observe by construction \(m_{k-1} = Q\).

   Prover sends \(m_l\) to verifier for \(i \in [0, 2k - 2]\).

4. \(V \rightarrow P\). Verifier picks randomly \(c \in \mathbb{Z}_p\) and sends \(c\) to prover.

   Define \(g' := \tilde{g}^c\) \(\in \mathbb{G}^{n+1/2}\) (* is denoted as component-wise product), \(Q' := \prod_{l=0}^{2k-2} m_{l}^c\), \(L' := \sum_{i=1}^{k} c^{i-1} \tilde{L}_i\).

5. \(P \rightarrow V\). Prover defines \(e' := c_{k-1}z_{k-1} k_{k-1}L'(z')\) to verifier.

   Verifier checks \(g^{e'} \cdot k^{L'(z')} = Q'\).

Fig. 11: HVZK argument \(\Pi([x], L, y; x)\) for relation \(R\).

- \(P \rightarrow V\): \(2k - 1\) elements of \(\mathbb{G}\) and \((n + 1)/k + 1\) element of \(\mathbb{Z}_p\).
- \(V \rightarrow P\): \(3\) elements of \(\mathbb{Z}_p\).

The completeness comes from:

\[
Q' := \prod_{l=0}^{2k-2} m_{l}^c = \prod_{l=0}^{2k-2} \left( \prod_{i,j,l=k+i-j-1} g^{\hat{z}_i}k^L(z) \right) c^l
\]

\[
= \prod_{l=0}^{2k-2} \left( \prod_{i,j,l=k+i-j-1} g^{\hat{z}_i}k^L(z) \right) c^{k+j-1}
\]

\[
= \prod_{i=1}^{k} \left( g_i^j \right)^{\sum_{j=1}^{k} c^{j-1} \hat{z}_j} k^{\sum_{i=1}^{k} c^{j-1} \hat{z}_j} \prod_{i=1}^{k} (L(\sum_{j=1}^{k} c^{j-1} \hat{z}_j))
\]

\[
= \left( g_1^c \cdot g_2^c \cdots \cdot g_{k-1}^c \cdot g_k^{c_{k-1}} \right) z' \cdot \prod_{i=1}^{k} c^{j-1} \tilde{L}_i = g^{e'} k^{L'(z')}
\]

The remaining proof of this theorem is directly obtained from [3], [30].
Amortizations.

Compressed many nullity checks as one. Given \( P = g^x h^r \), \( s \) linear functions \( L_i : \mathbb{Z}_p^{n} \rightarrow \mathbb{Z}_p \), the prover want to prove that \( L_i(x) = 0 \) for all \( i \in [1, s] \). By the well-known Schwartz-Zippel lemma, the prover can do many nullity checks at the cost of one single check plus one more element of \( \mathbb{Z}_p \) (challenge) from the verifier to prover. Especially, the verifier sends a challenge \( \varphi \sim \mathbb{Z}_p \) to prover and both of them then define the new linear form \( L(x) := \sum_{i=1}^{s} L_i(x) \varphi^{i-1} \). Observe that \( L(x) = 0 \) implies \( L_i(x) = 0 \) for all \( i \in [1, s] \) with the probability at least \( 1 - (s-1)/p \approx 1 \) when \( s \) is polynomial of \( \lambda \) and \( p \) is exponential. So, it requires only one single nullity check plus sending one more challenge \( \varphi \) to do \( s \) nullity checks, instead of running sequentially \( s - \Pi([x], L_i, 0; x) \) for \( i \in [1, s] \).

![Fig. 12: Many nullity checks \( \Pi_{\text{zeros}}([x], L_1, L_2, \ldots, L_s, 0; x) \).](image)

Amortized over many commitments. Given \( P_i = g^{x_i} h^{r_i} \) for \( i \in [1, s] \), the prover wants to convince that the evaluation of the same linear form \( L \) on committed vectors is correct i.e \( y_i = L(x_i) \). Intuitively, a prover can do this batch evaluation checks of \( L \) over many committed vectors \( x_i \) at the same cost of evaluation checks of \( L \) over only one committed vector. We can see that \( \hat{P} := A \prod_{i=1}^{s} P_i^{c_i} \) is the Pedersen commitment of vector \( z = \sum_{i=1}^{s} x_i c_i + \alpha \) where \( A \) is commitment of \( \alpha \) then \( L(z) := L(\alpha) + \sum_{i=1}^{s} c_i y_i \). Note that, assume \( c_0 \sim \mathbb{Z}_p \), if the prover knows the open of \( \hat{P} \), it means the prover has to know the open of each \( P_j \) with a probability of almost 1. Indeed, if there exists an \( P_j \) such that prover does not know the opening \( x_j \) then prover can cheat when having a correct opening of \( \hat{P} \) with at most probability of \( s/p \) (a cheating prover succeeds when \( c_0 \) is the zero of some polynomial of degree at most \( s \)).

Following the protocol \( \Pi \) (Figure 11), we modify the definition as \( z = \sum_{i=1}^{s} x_i c_i + \alpha, \gamma = \sum_{i=1}^{s} r_i c_0 + \beta \) and \( Q := A \prod_{i=1}^{s} P_i^{c_i} k^{\gamma} (\sum_{i=1}^{s} c_i y_i + t) \). So, prover and verifier now can interact following the same last 3-move in \( \Pi \).
**Protocol $\Pi_{\text{Am}}([x_i], L, y_i; x_i)_{i \in [1,s]}$**

Public parameters. $(g, h, k) \in \mathbb{G}^{n+2}, L : \mathbb{Z}_p^n \rightarrow \mathbb{Z}_p, P_i = g^{x_i}h^{r_i}, y_i = L(x_i)$ for $i \in [1,s]$. Protocol.

1. $P \rightarrow V$. Prover picks randomly $\alpha \xleftarrow{} \mathbb{Z}_p, \beta \xleftarrow{} \mathbb{Z}_p$. Prover sends $t = L(\alpha), A = g^\alpha h^\beta$ to verifier.

2. $V \rightarrow P$. Verifier chosen randomly $c_0, c_1 \xleftarrow{} \mathbb{Z}_p$ and sends them to prover.

   Prover defines $z = \sum_{i=1}^{s} x_i c_0^i + \alpha, \gamma = \sum_{i=1}^{s} r_i c_0^i + \beta$ and $\hat{z} := (z, \gamma)$.

   Define $\hat{g} := (g, h) \in \mathbb{G}^{n+1}, Q := A \prod_{i=1}^{s} P_i^{c_0^i k^i (\sum_{i=1}^{s} c_0^i y_i + t)}$ and $\hat{L}(z, \gamma) := c_1 L(z)$. Note that $Q := \hat{g}^{k \hat{L}(\hat{z})}$.

3. The last 3-move is the same as the protocol in the Figure 11.

**Fig. 13:** Amortized over many commitments $\Pi_{\text{Am}}([x_i], L, y_i; x_i)_{i \in [1,s]}$.

**Sublinear circuit satisfiability.** Given an arbitrary circuit $C$, we present a ZK proof for circuit satisfiability with *sublinear* communication cost based on generalized Pedersen commitments. Intuitively, the circuit $C$ is divided into $B$ smaller sub-circuits having the same number of input gates, addition gates, and multiplication gates then we do ZK proof in the batch of tuple $B$ elements corresponding with the same type of gate in $B$ sub-circuits. Without loss of generality, assume that the number of input gates, addition gates, and multiplication gates of circuit $C$ are multiple of $B$. If not the compiler to transfer general circuits in detail is explained in [50].

**Comparison with the Work of Attema–Cramer.** The work of [3] described a logarithmic-round and logarithmic communication protocol from the discrete logarithm assumption, and mentioned in a remark that their protocol can be made constant-round, at the cost of increasing the communication to $O(\sqrt{C})$. This builds upon a “direct” reduction from proving satisfiability of an arithmetic circuit to batch Hadamard arguments and proofs for linear relations. Working out the details, the protocol of Attema–Cramer enjoys

- A constant number of rounds,
- $O(\sqrt{C})$ communication,
- $O(C)$ CRS size,
- A computation dominated by $O(1)$ executions of an FFT on degree-$C$ polynomials (as well as $O(C)$ exponentiations).

In contrast, when using our approach (which proceeds by first building a ZK proof for SIMD circuits via the techniques of Attema–Cramer combined with a careful batch checking argument, then applying our general compile) also has constant round complexity and $O(\sqrt{C})$ communication, but additionally achieves a sublinear CRS size $O(\sqrt{C})$, and the computation is dominated by...
**Protocol $Π_{ZKPer}$**

- **Preprocessing circuit** Every $B$ same-type gate is divided into a group, the prover commits all left wire and right wire values of $B$ gates as in the SIDM-ZK framework. Then for every $j$-th group, we have 2 commitments $[l_j], [r_j]$.

  - If the $j$-th group includes addition gates, prover commits output wire values as in SIDM, i.e $[o_j] := [l_j][r_j]$.
  - Otherwise, if this group contains multiplication gates, prover computes two commitments $[o_j]$ and $[o'_j]$ while $[o_j]$ is defined in SIMDZK for output wire values of multiplication gates and $[o'_j]$ is defined as the same of $[l_j], [r_j]$.

  Note that $[o'_j], [o_j]$ are used for showing the correction of routing and multiplications respectively.

All the wires in circuit $C$ are also divided into groups of $B$ wires and each $j$-th group is committed to getting the commitments $[w_j]$ as $[l_j], [r_j]$.

- **Correctness check of multiplication gates** is presented in section 4.2.

- **Consistency check of wire routing** We use high-level intuition of the compiler described in the section 3 to prove the corresponding left wire, right wire, and output wire values of all gates to the wire values of circuit. As described, the core idea of our compiler is that we transfer the proof of consistency into a SIMD ZK which can be achieved with sublinear communication.

- **Consistency of output wire values of multiplication gates** Observe that for each group of multiplication gates, we have two commitments $[o'_j], [o_j]$ which committed of two vectors $o'_j$ and $o_j$. While $o'_j := (r, o_{1,j}, o_{2,j}, \ldots, o_{B,j}) \in \mathbb{Z}_p^{B+1}$ and $o_j := (b_j(0), o_{1,j}, o_{2,j}, \ldots, o_{B,j}, b_j(B+1), \ldots, b_j(2B)) \in \mathbb{Z}_p^{2(B+1)}$ where $r \in \mathbb{Z}_p$. Prover therefore need to prove that $[o'_j]/[o_j]$ is the commitment of vector which having the power $0$s of the set of generators $\{g_2, g_3, \ldots, g_{B+1}\}$. By the method described earlier, this is handled by a $Π_{Am}$ for checking many nullities.

If any check fails, $V$ aborts.

---

Fig. 14: The protocol of SHVZK for circuit satisfiability from Pedersen commitment.

$O(\sqrt{C})$ executions of an FFT on degree-$\sqrt{C}$ polynomials (as well as $O(C)$ exponentiations). For large circuits, due to the polylogarithmic overhead of FFTs, this translates to a lower computational overhead.

**A.4 Multi-Verifier ZK**

We describe the realization of the functionality $F_{\text{verifyprod}}$, which recursively compress the check of an inner product to a single multiplication triple. $F_{\text{verifyprod}}$ follows the fully linear PCP framework proposed by Boneh et al. [12] and uses the optimization proposed in [28,29]. Similar technique is also used in MPC-in-the-head ZKPs such as Limbo [22].
Inner product verification $F_{\text{verifyprod}}$. The functionality $F_{\text{verifyprod}}$ takes input $m\ell$ multiplication triples and first convert it into a dimension-$m\ell$ inner product triple $\{(\{x_{i,1}\}, \ldots, \{x_{i,\ell}\}), (\{y_{i,1}\}, \ldots, \{y_{i,\ell}\}), \{z_i\}\}_{i=1}^m$. Then it verifies whether $z = \sum_{i=1}^m x_i y_i$. It incurs only logarithmic communication and round-trip complexity via a compression procedure. Define $\mathbb{K}$ to be a field extension of $\mathbb{F}$ and is exponentially large to achieve negligible soundness error. The compression is done in the following steps.

1. Divide the triple into $m$ inner product triples of dimension-$\ell$
   $$\{(\{x_{i,1}\}, \ldots, \{x_{i,\ell}\}), (\{y_{i,1}\}, \ldots, \{y_{i,\ell}\}), \{z_i\}\}_{i=1}^m$$
   The shares $\{\{z_i\}\}_{i=1}^{m-1}$ are directly distributed by $\mathcal{P}$, and $\{z_m\} = [z] - \sum_{i=1}^{m-1} [z_i]$.
2. Interpolate $2\ell$ degree-$(m-1)$ polynomials
   $$([f_1(\cdot)], \ldots, [f_\ell(\cdot)]), ([g_1(\cdot)], \ldots, [g_\ell(\cdot)])$$
   such that $[f_j(i)] = [x_{i,j}]$ and $[g_j(i)] = [y_{i,j}]$, for $i \in [m], j \in [\ell]$.
3. $\mathcal{P}$ computes $z_i = \sum_{j=1}^\ell f_j(i) * g_j(i)$ for $i \in [m+1, 2m-1]$. It distributes $\{[z_i]\}_{i=m+1}^{2m-1}$ to parties.
4. Interpolate a degree-2$(m-1)$ polynomial $[h(\cdot)]$ such that $[h(i)] = [z_i]$ for $i \in [2m-1]$.
5. Jointly sample a random element $\beta \in \mathbb{K}$, and evaluate these $2\ell+1$ polynomials to $\beta$. Output a dimension-$\ell$ inner product triple
   $$([f_1(\beta)], \ldots, [f_\ell(\beta)]), ([g_1(\beta)], \ldots, [g_\ell(\beta)]), [h(\beta)]$$

The above compression is executed recursively for logarithmic rounds to reduce the dimension of inner product triple to constant. The last round of compression outputs a single multiplication triple $(f(\beta), g(\beta), h(\beta))$, whose correctness will be checked by all parties. Two facts are omitted in the above description, and we refer the readers to [53].

- The sampling of random point $\beta \in \mathbb{K}$ is done by Fiat-Shamir heuristic. It requires a public message known by all parties. At step 3, instead of $\mathcal{P}$ distributing a PSS $[z]$, parties first hold a random PSS $[r]$, then $\mathcal{P}$ broadcasts $u = z + r$. Then parties not only derive $[z] = [r] - u$, but also attain the public message $u$.
- To achieve zero-knowledge, the multiplication triple output at the final round should not reveal any private information. Some independent randomness will be added into the final round of compression to mask the information computed from the witness.

A.5 SIMD ZK from Ligero

We present the SIMD ZK protocol modified from Ligero [2] with its security analysis. As shown in Figure 17, the Protocol $\Pi_{\text{LigeroSIMD}}$ is an instantiation of
**Protocol II_{RSEncode}**

**Parameters.** Define parameters $n, B$ such that $B < n$. $L^d_{RS}$ is the set of all valid Reed-Solomon codeword. $\alpha_1, \ldots, \alpha_n, \gamma_1, \ldots, \gamma_B \in \mathbb{F}$ are $n + B$ evaluation points. A pseudorandom generator $\text{PRG} : \{0, 1\}^n \rightarrow \{0, 1\}^*$. 

**Input.** Parameter $d$ satisfying $B \leq d + 1 < n$, vector $x \in \mathbb{F}^B$ and a randomness $r \in \{0, 1\}^*$. 

**Encode.** If $B < d + 1$, compute $(\bar{x}_1, \ldots, \bar{x}_{d-B+1}) \leftarrow \text{PRG}(r)$. Interpolate a degree-$d$ polynomial $f_x(\cdot)$ such that $f_x(\gamma_i) = x_i$ for $i \in [B]$ and set $f_x(\alpha_i) = \bar{x}_i$ for $i \in [d-B+1]$. Output $(f_x(\alpha_1), \ldots, f_x(\alpha_n))$.

**Protocol II_{MerkleCdP}**

Define $\tau_{\text{max}}$ to be the maximum time a commitment is used in the MerkleProve procedure.

**MerkleCommit.** On input a vector $x$ of size $|x| = n$, Build a Merkle tree with $2^{\lceil \log_2 n \rceil}$ leaf nodes, where the first $n$ leaf nodes are elements in $x$ and the rest are dummy. Output the root node $\tau \in \{0, 1\}^{3n}$. Initiate a counter $\text{ctr} = 0$ and associate it with $\tau$.

**MerkleProve.** On input $(x, Q) \in \mathbb{F}^n \times \mathbb{N}'$, first check the counter $\text{ctr}$ that associates with the commitment to $x$. If $\text{ctr} \geq \tau_{\text{max}}$, abort. Recompute the Merkle tree as described in MerkleCommit. Construct $\sigma$ which contains all sibling nodes that are on the path to the leaves in set $\{x[i]\}_{i \in Q}$. Output $\sigma$. Set $\text{ctx} = \text{ctr} + 1$.

**MerkleVerify.** On input $(\tau, \{x[i]\}_{i \in Q}, \sigma)$, reconstruct the Merkle tree path from $\{x[i]\}_{i \in Q}, \sigma$. Define the Merkle tree root $\tau'$. If $\tau \neq \tau'$, output fail.

Fig. 15: Reed-Solomon Encoding.

Fig. 16: Merkle tree vector commitment.

$\mathcal{FSIMDZK}$. Its two building blocks are the Reed-Solomon (RS) encoding described in Figure 15 and the Merkle commitment scheme described in Figure 16.

*Commit and open procedures.* As described in Figure 15, to commit to a vector of field elements $w \in \mathbb{F}^B$, the prover first pad it into a length-$(d + 1)$ vector with randomly sampled $d - B + 1$ elements, then encode them into a Reed-Solomon codeword $u \in \mathbb{F}^n$. Eventually, the codeword is committed by the Merkle commitment scheme specified in Figure 16. Assume a set of $n + B$ distinct evaluation points $\alpha_1, \ldots, \alpha_n, \gamma_1, \ldots, \gamma_B \in \mathbb{F}$. The $(n, d + 1)$-RS codes naturally determines a degree-$d$ polynomial $f(\cdot)$ where $f(\alpha_i) = u_i$ for $i \in [n]$ and $f(\gamma_i) = w_j$ for $i \in [B]$. To open a commitment, the prover simply reveal the vector $w$ and the randomness used for padding.

Assume that during the commitment phase, $\mathcal{P}$ commits to $m$ batches of wire values $(w_1, \ldots, w_m)$ to $\mathcal{V}$, which are taken as inputs during the proving phase.
to convince $\mathcal{V}$ of the relation $C(\|w_1\|, \ldots, \|w_m\|) = 0$. The proving phase (in Figure 17) for SIMD circuits inherits the tests of interleaved linear codes and quadratic constraints over interleaved linear codes from the Ligero [2] protocol. Throughout these steps, the challenges sampled and sent by $\mathcal{V}$ include the coefficients $(r, \bar{r})$ for random linear combination, and the set $Q$ indicating the subset of $t$ elements to be opened among a length-$n$ codeword.

Testing interleaved Reed-Solomon Codes. $\mathcal{P}$ additionally samples and commits to a random vector $w_{m+1}$ as a mask. Upon receiving the coefficients $\bar{r} \in \mathbb{F}^{m+1}$, $\mathcal{P}$ responds with the polynomial $h(\cdot) := \sum_{j=1}^{m+1} \bar{r}_j f_j(\cdot)$, in which $\{f_j(\cdot)\}_{j \in [m]}$ are RS polynomials for committed wire values $\{w_i\}_{j \in [m]}$ and $f_{m+1}(\cdot)$ is for $w_{m+1}$. $\mathcal{V}$ first checks whether $h(\cdot)$ is a degree-$(k-1)$ polynomial. This is equivalent to check whether $(h(\alpha_1), \ldots, h(\alpha_n))$ is a valid Reed-Solomon codeword. After $\mathcal{P}$ reveals $\{u_j[i]\}_{i \in Q}$ and proves the correctness of Merkle opening, $\mathcal{V}$ checks whether $h(\alpha_i) := \sum_{j=1}^{m+1} \bar{r}_j u_{j}[i]$ for $i \in Q$. This step validates whether the openings of $t$ views are consistent with the previously claimed $h(\cdot)$. Since $h(\cdot)$ is the linear combination of committed codewords, the passing of above steps validates the correctness of RS encoding for all committed wire values. Intuitively, the zero-knowledge property is preserved by the mask $f_{t+1}(\cdot)$ corresponding to the randomly sampled $w_{m+1}$. The soundness requires that $\bar{r}$ is uniformly sampled after $\mathcal{P}$ committing to $w_{m+1}$ and it is hard for $\mathcal{P}$ to guess correctly the subset $Q$ of size $t$.

Testing quadratic constraints over interleaved Reed-Solomon Codes. Assume that there are $\ell$ batches of multiplication gates in the SIMD circuit. $\mathcal{P}$ constructs and commits to $w_{m+2} := 0^\ell$. Unlike the $(n, k)$-RS encoding used during previous the commitment phase, $\mathcal{P}$ needs to perform a $(n, 2k-1)$-RS encoding for $w_{m+2}$, which results in a degree-$(2k-2)$ polynomial $f_{m+2}$. Upon receiving the challenge $r \in \mathbb{F}^m$, $\mathcal{P}$ responses with the degree-$(2k-2)$ polynomial $g(\cdot) := f_{m+2}(\cdot) + \sum_{j=1}^{\ell} r_j (f_{\alpha_j}(\cdot)f_{\beta_j}(\cdot) - f_{\gamma_j}(\cdot))$ where $(\alpha_j, \beta_j, \gamma_j)$ are the indices of input and output wires of a batch of multiplication gates. It becomes clear at this point that the reason why $f_{m+2}$ has degree $2(k-1)$ instead of $k-1$ is to disguise the combination of circuit wire encodings, which is a degree-$(2k-2)$ polynomial. If all multiplication gates are computed correctly, it should satisfy that $g(\gamma_i) = 0$ for $i \in [B]$. After $\mathcal{P}$ reveals $\{u_j[i]\}_{i \in Q}$ and proves the correctness of Merkle opening, $\mathcal{V}$ checks their consistency with $g(\cdot)$. In terms of the security, the zero-knowledge is guaranteed by the mask $f_{m+2}$. Similar to the previous test, the soundness requires that $r$ is uniformly sampled after $\mathcal{P}$ committing to $w_{m+2}$ and it is hard for $\mathcal{P}$ to guess correctly the subset $Q$ of size $t$.

Security Analysis. We provide formal security analysis of the SIMD Ligero and pay attention to the case then compiling SIMD Ligero to general ZK and scalable ZK. We claim that the protocol $P_{\text{LigeroSIMD}}$ shown in Figure 17 securely realizes the functionality $F_{\text{SIMDZK}}$. The proof separately considers the case of corrupted verifier and prover. In each case, a PPT simulator $S$ is constructed to simulate the view of adversaries.

Malicious Verifier $\mathcal{V}^\ast$. We construct a PPT simulator $S$ to simulate the view of $\mathcal{V}^\ast$ executing the Protocol 17. $S$ interacts with $A$ in the following way.
For any batches of addition gates indexed by $V$ such that $k < n$ and $B + t - 1 \leq k$, $\alpha_1, \ldots, \alpha_n, \gamma_1, \ldots, \gamma_B \in \mathbb{F}$ are $n + B$ evaluation points.

**Private inputs.** $P$ holds witness $(w_1, \ldots, w_m)$ such that $C(w_1, \ldots, w_m) = 0^B$.

**Commit.** On input $w \in \mathbb{F}^B$ and degree $d$ from $P$, parties proceed as follows: (By default, $d = k - 1$, unless otherwise mentioned.)

1. $P$ samples a uniform $r \in \{0, 1\}^n$ and invokes $(u, f(\cdot)) \leftarrow \text{RSEncode}(w, r, d + 1)$. The codeword $u$ and degree-$d$ polynomial $f(\cdot)$ satisfy that $u \in L_{RS}^d$ and $f(\gamma_i) = w[i]$ for $i \in [B]$.
2. $P$ invokes $\tau \leftarrow \text{MerkleCommit}(u)$ and sends $\tau$ to $V$.
3. Output $[w] := ((w, r, u, f(\cdot)), \tau)$ such that $P$ holds $(w, r, u, f(\cdot))$ and $V$ holds $\tau$.

**Open.** On input $[w] := ((w, r, u, f(\cdot)), \tau)$ from $P$ and $V$, $P$ sends $(w, r)$ to $V$. $V$ recomputes $(u', f'(\cdot)) \leftarrow \text{RSEncode}(w, r)$ then $\tau' \leftarrow \text{MerkleCommit}(u')$. If $\tau \neq \tau'$, $V$ aborts.

**Prove.** On input $([w_1], \ldots, [w_m])$, parties proceed as follows:

1. $P$ uniformly samples $w_{m+1} \in \mathbb{F}^B$ and constructs $w_{m+2} := 0^B$, $P$ and $V$ invoke the above Commit procedure to obtain $[w_{m+1}]$. They also invoke Commit with input $d = 2(k - 1)$ to obtain $[w_{m+2}]$.
2. $V$ uniformly samples $r \in \mathbb{F}^m, \tilde{r} \in \mathbb{F}^{m+1}$ and sends them to $P$.
3. $P$ holds $\{[w_j]\}_{j \in [m+2]} := \{(w_j, r_j, u_j, f_j(\cdot))\}_{j \in [m+2]}$. $P$ defines:
   (a) $g(\cdot) := f_{m+2}(\cdot) + \sum_{j=1}^m r_j (f_{\alpha_j}(\cdot) f_{\beta_j}(\cdot) - f_{\gamma_j}(\cdot))$ where $(\alpha_j, \beta_j, \gamma_j)$ are the indices of input and output wires of $j$-th batch of multiplication gates.
   (b) $h(\cdot) := \sum_{j=1}^{m+2} \tilde{r}_j f_j(\cdot)$.
   $P$ sends the degree-$(2k - 2)$ polynomial $g(\cdot)$ and degree-$(k - 1)$ polynomial $h(\cdot)$ to $V$. $V$ checks whether $g(\gamma_i) = 0$ for $i \in [B]$. It also checks whether the codeword $(h(\alpha_1), \ldots, h(\alpha_n)) \in L_{RS}$. If not, it aborts.
4. $V$ uniformly samples a set $Q \subset [n]$ of size $|Q| = t$, and sends it to $P$. For $j \in [m + 2]$, $P$ invokes $\sigma_j \leftarrow \text{MerkleProve}([w_j], u, Q)$. It sends $\{u_j[i]\}_{i \in Q}$ and the proof of opening $\sigma_j$ to $V$. $V$ invokes $\text{MerkleVerify}([w_j], \tau, \{u_j[i]\}_{i \in Q}, \sigma_j)$. If the verification fails, $V$ aborts.
5. $V$ checks the followings:
   (a) For any batches of addition gates indexed by $(\alpha, \beta, \gamma)$, it satisfies that $u_\alpha[i] + u_\beta[i] - u_\gamma[i] = 0$ for $i \in Q$.
   (b) $g(\alpha_1) := u_{m+2}[i] + \sum_{j=1}^t r_j (u_{\alpha_j}[i] \cdot u_{\beta_j}[i] - u_{\gamma_j}[i])$ for $i \in Q$.
   (c) $h(\alpha_1) := \sum_{j=1}^{m+2} \tilde{r}_j u_\gamma[i]$ for $i \in Q$.

If any check fails, $V$ aborts.

Fig. 17: The protocol of SIMD ZK from Ligero.
1. $S$ emulates a random oracle $O_H$.
2. $S$ simulates the procedure MerkleCommit as an honest prover do. It stores the commitment $\tau_j$ for each committed vector $\mathbf{w}_j$.
3. On simulating the Prove, $S$ first commits to random $(\mathbf{w}_{m+1}, \mathbf{w}_{m+2})$. Upon receiving $r, \bar{r}$ from $A$, $S$ samples a random degree-$(2k-2)$ polynomial $\bar{g}(\cdot)$ such that $\bar{g}(\gamma_i) = 0$ for $i \in [B]$. It also samples a random degree-$(k-1)$ polynomial $\bar{h}(\cdot)$ it sends $(\bar{g}(\cdot), \bar{h}(\cdot))$ to $A$.
4. On receiving $Q$ from $A$, $S$ constructs $\{\{\mathbf{u}_j[i]\}_{i \in Q}\}_{j=1}^{m+2}$ in the following ways:
   a. Construct random $\{\{\mathbf{u}_j[i]\}_{i \in Q}\}_{j=1}^m$ that can pass the check at step 5a.
   b. Randomly sample the rest of the values except for $\mathbf{u}_{\ell+1}$ and $\mathbf{u}_{\ell+2}$.
   c. For $i \in Q$, set
   \[
   \mathbf{u}_{m+2}[i] := \bar{g}(\alpha_i) - \sum_{j=1}^\ell r_j (\mathbf{u}_{\alpha_j}[i] \cdot \mathbf{u}_{\beta_j}[i] - \mathbf{u}_{\tau_j}[i])
   \]
   d. For $i \in Q$, set $\mathbf{u}_{m+1}[i] := \bar{h}(\alpha_i) - \sum_{j=1}^m \mathbf{u}_j[i]$.
5. $S$ simulates the procedure MerkleProve. It aborts if the counter of a commitment reaches the maximum. Otherwise it samples random sibling nodes $\{\mathbf{\sigma}_j\}_{j \in [m+2]}$ and sends $\{\mathbf{\sigma}_j\}_{j \in [m+2]}$ to $A$. $S$ acts as an honest prover to compute Merkle trees from the above constructed codewords. It constructs a list $L$ which records all inputs to the random oracle $O_H$ when computing the root node.
6. $S$ simulates the procedure MerkleVerify by monitoring the random oracle query to $O_H$. For any query $q = L[j]$ where $j \in [m]$, answer the query with $\tau_j$ (stored when simulating MerkleCommit). Otherwise sample a uniform answer $\bar{\tau} \in \{0, 1\}^{2n}$.

We argue the indistinguishability between the real and ideal world from the view of $A$. $S$ emulates a programmable random oracle $O_H$ to handle oracle queries and program the oracle output for certain inputs. At step 2, $S$ simulates the Merkle tree commitment by a random value $\tau_j$ of which the length is the same as a hash output. The polynomials $g(\cdot)$ and $h(\cdot)$ sent by $S$ is indistinguishable from their distribution in real-world because: (i) In the real world, degree-$(2k-2)$ polynomial satisfies $g(\gamma_i) = 0$ for $i \in [B]$ and $g(\cdot)$ is masked by a random polynomial $f_{m+2}(\cdot)$ such that $f_{m+2}(\gamma_i) = 0$ for $i \in [B]$; (ii) In the real world, $h(\cdot)$ is masked by a random polynomial $f_{m+1}(\cdot)$. It is simulated by the random $\bar{h}(\cdot)$ of the same degree. At step 4, $\{\{\mathbf{u}_j[i]\}_{i \in Q}\}_{j \in [\ell+2]}$ constructed by $S$ follows the real-world distribution as long as the Reed-Solomon encoding has degree $k > B + t - 1$. In this case any $t$-out-of-$n$ views are random and independent of the encoded vector, thus indistinguishable from what are sampled by $S$. At last, $S$ monitors the random oracle and programs any oracle query in $L$ for the computing of the Merkle tree root. $\Pi_{LigerosSIMD}$ requires that the randomness of the commitment contains enough entropy so that there is negligible probability for $A$ to make queries whose results match elements in $L$. 

45
Notes on achieving zero-knowledge. The protocol $\Pi_{\text{LigeroSIMD}}$ shown in Figure 17 turns Ligero into a commit-and-prove SIMD ZK. It comes with restriction such that a commitment can only be used in a limited number of proofs. Also, this maximum number of usage $\tau_{\text{max}}$ is determined upon the committing phase. This has been reflected in the functionality $F_{\text{SIMDZK}}$: a counter $\text{ctr}_w$ is attached to the commitment to vector $w$. Each time a $J_w K$ contributes to a proof, $\text{ctr}_w$ increments. A commitment should never be used again unless in the opening phase, if its counter reaches $\tau_{\text{max}}$. In Ligero, it is related to the fact that each proving phase exposes $t$-out-of-$n$ committed views, which are $t$ elements in each of size-$n$ codeword. Once $(k+1)$-out-of-$n$ elements in the codeword is disclosed, the commitment will be fully opened. Hence, the parameters chosen during the initial committing phase fully determine $\tau_{\text{max}}$ and it should guarantee $t \cdot \tau_{\text{max}} \leq k$.

Malicious Prover $P^*$. We analyze the soundness error when $A$ (as a cheating prover) causes $S$ to abort in the ideal world, but successfully cheats in the real world. Ligero follows the framework of MPC-in-the-Head in a way that $P$ first commits to $n$ views to $V$ by the Reed-Solomon code with length-$n$ codeword. Then $V$ randomly chooses $t$ of them to open and check. A cheating prover $P^*$ is not caught if the opened $t$ parties are among the honest parties that $P^*$ emulates. The probability of such event is required to be negligible. Define $\Pr[\text{succ}]$ to be the probability that the cheating prover $A$ succeeds. We adapt the optimized soundness analysis in the full version of Ligero. We have

$$\Pr[\text{succ}] \leq \frac{(k+e)}{t} + \frac{(2^k-2)}{t} + n + 2 \frac{n}{|\mathbb{F}|}$$

in which the error $e < (n-k+1)/2$.

A.6 Succinct Non-Interactive Arguments from Pairing

Campanelli et al. propose LegoSNARK [18], a framework for commit-and-prove zk-SNARKs (CP-SNARKs). In Figure 18, we present one of the detailed construction mentioned in the paper, which naturally fits the intuition of $F_{\text{SIMDZK}}$. It has to be mentioned that in SNARK paradigm, to achieve succinctness, batch size $B$ is fixed as circuit size $C$. Basically what we are doing is to show that SNARK can also be constructed via our methodology: parallel proof of multiplication gate ($F_{\text{SIMDZK.Prove}}$) and wire consistency check ($F_{\text{SIMDZK.LinearMap}}$).

In this construction, $P$ commits vector to its multilinear extension (MLE), which is the (unique) multilinear polynomial $f_w : \mathbb{F}^d \rightarrow \mathbb{F}$ such that $w(b) = f_w(b)$ for all $b \in \{0,1\}^d$. Formally, it is defined as:

$$f_w(X_1, \ldots, X_d) = \sum_{b \in \{0,1\}^d} \chi_b(X_1, \ldots, X_d) \cdot w(b).$$

where $\chi_b(X_1, \ldots, X_d) = \prod_{i=1}^d \chi_{b_i}(X_i), \chi_1(X) = X$ and $\chi_0(X) = 1 - X$.

Besides MLE, the protocol realizes our functionality based on two gadgets $F_{\text{poly}}$ and $F_{\text{sc}}$. $F_{\text{poly}}$ checks the relation $R_{\text{poly}}$ over $\mathbb{F}^d \times \mathbb{F} \times \mathbb{F}$, where
**Protocol II\textit{LegoSNARK}**

**Setup:** Upon receiving maximum degree $d$, sample a bilinear group and define $(\text{group}) := (G_1, G_2, q, g, e)$ where $e : G_1 \times G_1 \to G_2$ and $g$ is a generator of $G_1$. Sample random elements $s_1, \ldots, s_{d+1} \in F_q$ and compute vector

$$\Sigma := \{g^{\ell_i}w^{s_i}\}_{w \in \mathbb{Z}^{d+1}}.$$  

and then output the public parameters $(\text{group}), \Sigma, g^{s_{d+1}}$.

**Commit:** On input $w$, $P$ computes its multilinear extension $f_w$ and uniformly samples $\tilde{r} \in F$, and then sends $V$ the commitment $[f_w] = g^{[w(1 + \ldots + d) + r]s_{d+1}}$. To commit a single value $w$, the commitment $[w]$ is simply $g^{w + rs_{d+1}}$.

**Open:** On input $([w], w, \tilde{r})$, $V$ checks whether $[w] = g^{w + rs_{d+1}}$.

**Prove:** For each gate $(\alpha, \beta, \gamma, T)$ in the public circuit $C$, commitments to input wire vectors $[f_{\alpha}]$ and $[f_{\beta}]$ are also public:

- If $T = ADD$, every party locally computes $[f_{\gamma}] = [f_{\alpha}] \cdot [f_{\beta}]$.
- If $T = MULT$:
  1. $P$ computes $f_{\gamma}$ such that $f_{\gamma}(i) = f_{\alpha}(i) \cdot f_{\beta}(i)$ for $i \in \{0, 1\}^d$. Then $P$ calls Commit to obtain $[f_{\gamma}]$.
  2. $V$ picks a random point $r$ and sends it to $P$.
  3. $P$ computes $t = f_{\gamma}(r)$ and calls Commit to obtain $[t]$. Then $F_{\text{poly}}$ is invoked to check that $f_{\gamma}(r) = t$.
  4. $P$ calls $F_{\text{sc}}$ to prove that $t = \sum_{b \in \{0, 1\}^d} \sigma q(r, b) : f_{\alpha}(b) \cdot f_{\beta}(b)$, where $\sigma q(r, b) = 1$ when $r = b$, otherwise 0.

If any check fails, $V$ aborts.

**Linear map:** Upon receiving $([f_x], [f_y], M)$, to prove that $x = My$:

1. Trusted parties compute the MLE of matrix $M$, and form its commitment $[f_M]$, as well as corresponding proving and verification key.
2. $V$ samples a random point $r$ and sends to $P$. Then $P$ uses Commit to obtain $[g]$, where $g(S) = f_M(r, S)$.
3. $V$ picks another random point $\sigma$ and sends to $P$. $P$ computes $t = g(\sigma) = f_M(r, \sigma)$ and commit $t$.
4. $F_{\text{poly}}$ is invoked to check that $g(\sigma) = t$ and $f_M(r, \sigma) = t$, and $V$ calls Open to check the correctness of $[t]$.
5. $P$ computes $f_x(r) = k$ and calls Commit to obtain $[k]$, and $V$ calls Open to check its correctness.
6. $P$ calls $F_{\text{sc}}$ to prove that $k = \sum_{b \in \{0, 1\}^d} g(b) \cdot f_y(b)$.

If any check fails, $V$ aborts.

Fig. 18: The protocol of eSIMDZK from LegoSNARK.

$R_{\text{poly}}(x, f, y) := y \equiv f(x)$. $F_{\text{sc}}$ enables a prover to convince a verifier of the validity of a statement of the form $t = \sum_{b \in \{0, 1\}^d} g(b)$ where $g : \mathbb{F}^d \to \mathbb{F}$. It has already been realized in paper [39]. To achieve zero-knowledge, $g$ is defined in the
form $\prod_{i=0}^2 g_i(S)$, that all the $g_i$’s, except $g_0$, are committed. Namely, $F_{sc}$ checks the relation $R_{sc}(x, u)$, with $x \in F$ and $u \in F \times F^2$, that is formally defined as:

$$R_{sc}(g_0, (t, g_1, g_2)) = 1 \iff g(S) = \prod_{i=0}^2 g_i(S) \land t = \sum_{b \in \{0, 1\}^d} g(b).$$

The construction in Figure 18 is secure in the GGM and random oracle model, and it has only linear proving time. The size of the proof in this protocol is $\mathcal{O}(\log n)$, while it can be shrink to $\mathcal{O}(1)$ by applying another method of proving Hadamard product [38], with a $\log n$ blow-up in prover’s computation cost.