Abstract

We optimize Zero Knowledge (ZK) proofs of statements expressed as RAM programs over arithmetic values. Our arithmetic-circuit-based read/write memory uses only 4 input gates and 6 multiplication gates per memory access. This is an almost $3 \times$ total gate improvement over prior state of the art (Delpech de Saint Guilhem et al., SCN’22).

We implemented our memory in the context of ZK proofs based on vector oblivious linear evaluation (VOLE), and we further optimized based on techniques available in the VOLE setting. Our experiments show that (1) our total runtime improves over that of the prior best VOLE-ZK RAM (Franzese et al., CCS’21) by $2-20 \times$ and (2) on a typical hardware setup, we can achieve $\approx 600K$ RAM accesses per second.

We also develop improved read-only memory and set ZK data structures. These are used internally in our read/write memory and improve over prior work.
## Contents

1 Introduction 1
   1.1 Our Contribution 1
   1.2 Intuition 2

2 Preliminaries and Notation 3
   2.1 Universal Composability 3
   2.2 Checking Permutations via Polynomials 3
   2.3 VOLE-based Zero Knowledge 4
   2.4 Miscellaneous Notation 5
   2.5 Models 5

3 Related Work 7

4 Technical Overview 9
   4.1 Read-Only Memory 9
   4.2 Sets with Membership Queries 11
   4.3 Read/Write Memory 12

5 Approach 14
   5.1 Formal Protocol and Security 14
   5.2 Optimizations 17

6 Evaluation 18

A Universal Composability 28

B VOLE Functionality 28

C Proof of Security 28
   C.1 Soundness Lemmas 28
   C.2 Proof of Lemma 1 32
   C.3 Proof of Theorem 1 33

D Additional Optimizations 34

E Additional Evaluation 35

F Commit-and-Prove ZK 35
1 Introduction

Zero Knowledge (ZK) proofs [GMR85] allow a prover $P$ to demonstrate to a verifier $V$ the truth of some statement while revealing nothing additional. ZK has many applications, e.g., private blockchain [SCG+14], private bug-bounty [HYDK21], private software analysis [FDNZ21, LAH+22], ZKML [WYX+21], private network auditability [GAZ+22], and many more.

General-purpose ZK systems often express proof statements as circuits or constraint systems over Boolean or arithmetic values. However, the past decade has shown increased interest in expressing statements as random access machine (RAM) programs [BCG+13, BFR+13, BSCTV14, WSR+15, HMR15, MRS17, BCG+18, BHR+20, HK20, HK21, BHR+21, FKL+21, DdSGOTV22]. The RAM-ZK paradigm is exciting both because some statements are made more efficient by the random access capability and because RAM-ZK opens the door to handling statements written in commodity programming languages (see e.g. [BCG+13, HYDK21]). Thus, RAM-ZK promises to allow a broad audience to use widely available tools to build efficient proofs of arbitrary statements. If we hope to deliver on this promise, it is crucial to develop RAM-ZK that is as efficient as possible.

The cost of ZK proofs. General purpose ZK systems incur three primary costs: proof size (communication), $P$ computation, and $V$ computation. The literature has generated many approaches to ZK, and these approaches achieve varying levels of efficiency with respect to these metrics.

Seeking RAM improvement for a variety of circuit-based ZK systems, we instead focus on the total circuit size metric, the improvement of which implies improvement to all basic costs. Specifically, we reduce the size of circuit needed to implement ZK RAM, and our results accordingly imply general improvement to ZK RAM.

Constant overhead ZK RAM. Recently, [DdSGOTV22] demonstrated a reduction from ZK RAM to arithmetic circuits where the size of the arithmetic circuit scales linearly with the number of memory accesses. I.e., the number of gates needed for each RAM access is only a constant.

While this asymptotic scaling is ideal, the underlying constants remain relatively high. In most arithmetic ZK systems, we are most interested in reducing the number of non-linear gates, which include multiplication gates and input gates (linear gates are typically far cheaper). The [DdSGOTV22] construction uses 27 non-linear gates per access.

1.1 Our Contribution

We construct constant overhead ZK RAM with state-of-the-art circuit size. Given standard arithmetic circuits, our construction uses only 4 input gates and 6 multiplication gates per RAM access (and a constant number of linear gates).

The gates in our RAM are highly structured: viewing the circuit globally, five of six multiplication gates are each part of a large product computation (i.e., a high fan-in gate). This structure is important because in some ZK systems, it is possible to exploit such structure to further improve performance.

We implemented our RAM in the context of ZK based on vector oblivious linear evaluation (VOLE) [WYKW21, DIO21, BMRS21, YSWW21, WYX+21, BMH+21, BBMHS22, DILO22, WYY+22], a ZK paradigm noted for its fast end-to-end proofs (see discussion of VOLE-based ZK in Section 2.3). We use an idea suggested by [FKL+21] to leverage QuickSilver’s [YSWW21] support...
polynomial evaluation to take advantage of circuit structure and significantly decrease total cost. Our implementation is publicly available at https://github.com/gconeice/improved-zk-ram.

Prior work [FKL+21] developed a ZK RAM specialized for VOLE-based techniques. We compare to their publicly available implementation [WMK16]; our experiments (see Section 6) show that our end-to-end proof runtime performance outperforms that of [FKL+21] by 2-20×, depending on the network bandwidth and the RAM size.

We also present related ZK data structures: a read-only memory (ROM) and a set membership structure whose performance substantially improves the state of the art. Our ZK ROM outperforms that of [FKL+21] by 3-34×, depending on the network bandwidth and the RAM size. On a standard setup, it can execute each access in under 1µs.

While our constructions leverage novel insights, their designs are conceptually simple and easy to implement. The results are also highly generic, and are suited to any ZK system (even those that use, e.g., R1CS instead of circuits) with support for (1) arithmetic gates over a large field and (2) public-coin random challenges. Indeed, our results are applicable to arithmetic ZK systems in the commit-and-prove paradigm [CLOS02] (see Appendix F), including VOLE-based systems, proof systems based on MPC-in-the-head, e.g. [IKOS07, AHIV17, dOT21], and zkSNARKs, e.g. [BBB+18, MBKM19, CHM+20, ABC+22].

1.2 Intuition

A key insight behind ZK RAM is that the ZK prover $P$ can act as an oracle, providing as input the correct data corresponding to each memory access. However, $P$ might try to cheat, on each access providing arbitrary values that lead to an erroneous “proof”. The challenge is thus to check that malicious $P$ inputs memory values correctly.

Prior state-of-the-art ZK RAM [FKL+21, DdSGOTV22] build such checks from permutation proofs (see Section 2.2), which allow $P$ to convince the verifier $V$ that one vector of wire values is a permutation of another. Based on this, [FKL+21, DdSGOTV22] allow $P$ to introduce two vectors. One vector is a list of memory access metadata arranged in the order accesses occur in the program; the second vector is the same metadata arranged in order of memory address. The first vector is used to satisfy each memory access; the second is used to prove $P$ did not cheat. $P$ proves that the two vectors are related by a permutation, then uses a circuit to scan the second vector, proving global consistency properties of the vector.

While conceptually simple, this global check introduces inefficiency in the form of large numbers of inputs from $P$ and relatively costly comparisons (see Section 3 for details).

Our construction also leverages permutation proofs, but its permuted vectors are (1) the list of elements written to RAM (with metadata) and (2) the list of elements read from RAM (with metadata). We discard [FKL+21, DdSGOTV22]’s global consistency check in favor of a local check performed each time an element is accessed:

Check 1. The memory element read from RAM at time $t$ was also written to RAM at some time $t'$, and in particular $t' < t$.

Our key insight is that – with careful design – locally performing Check 1 enforces a global consistency property sufficient to achieve a sound protocol.

Our approach requires fewer gates than prior work because (1) we can assemble our two vectors with fewer inputs from $P$ and (2) our local checks can exploit static information available at the time of each access. For instance, [FKL+21, DdSGOTV22]’s consistency check must explicitly check
whether two compared addresses are equal to each other or not; for us, this check is irrelevant, as from our permutation check we \textit{implicitly} know that the read value is equal to some past write value to the same address.

In total our entire proof uses:

- A permutation proof (reads are a permutation of writes).
- A per-access set membership proof (the corresponding write was made in the past).
- A per-access multiplication (used to multiplex the effect of \texttt{load} versus \texttt{store}).

We give a set data structure that implements our membership proofs and that can be achieved from a single permutation proof. Thus, the entire RAM essentially reduces to two permutation proofs (and one auxiliary multiplication). This yields a small circuit that is fast to execute and easy to implement.

2 Preliminaries and Notation

2.1 Universal Composability

We formalize security in the universal composability (UC) framework [Can01]. We review UC framework in Appendix A.

2.2 Checking Permutations via Polynomials

Our approach relies on efficient permutation proofs. Namely, given two vectors $\mathbf{x}$ and $\mathbf{y}$ stored on arithmetic circuit wires, we wish for $\mathcal{P}$ to prove to $\mathcal{V}$ that there exists some permutation $\pi$ such that $\mathbf{x} = \pi(\mathbf{y})$. When $\mathbf{x}$ and $\mathbf{y}$ are indeed related by a permutation, we write $\mathbf{x} \sim \mathbf{y}$:

$$\mathbf{x} \sim \mathbf{y} \iff \exists \pi. \mathbf{x} = \pi(\mathbf{y})$$

One approach to proving $\mathbf{x} \sim \mathbf{y}$ uses circuit structures called permutation networks [Ben64, Wak68]. Such networks can permute $n$ wire values using $O(n \cdot \log n)$ gates. Several works use such networks in the context of ZK RAM [BCG+13, BSCTV14, HK20, HK21]. The inherent $\log n$ overhead is relatively expensive in practice.

A more efficient approach observes that to prove $\mathbf{x} \sim \mathbf{y}$ it is not necessary to compute a permutation $\pi$. Instead, $\mathcal{P}$ can prove \textit{non-constructively} that such a permutation exists with high probability. To achieve this, prior work [Nef01] suggested interpreting the two vectors $\mathbf{x}$ and $\mathbf{y}$ as polynomials, and then testing those polynomials for equality. Namely, we can view $\mathbf{x}$ (resp. $\mathbf{y}$) as a polynomial $p$ (resp. $q$) with formal parameter $X$ where each element $\mathbf{x}[i]$ (resp. $\mathbf{y}[i]$) is a polynomial root:

$$p(X) = \prod_i (X - \mathbf{x}[i]) \quad q(X) = \prod_i (X - \mathbf{y}[i])$$

If the two vectors $\mathbf{x}$ and $\mathbf{y}$ are indeed related, then they contain the same roots, and hence they encode the same polynomial. Thus, to prove that $\mathbf{x}$ and $\mathbf{y}$ are related, it suffices to prove that $p$ and $q$ are the same polynomial:

$$\mathbf{x} \sim \mathbf{y} \iff p = q$$
Checking equality of two polynomials can be achieved with high probability using the well known polynomial identity test. In this test, we uniformly sample a value \( r \in \mathbb{F} \), and then check if \( p(r) - q(r) = 0 \). The well known fact is that if this check passes, then \( p \) and \( q \) are indeed the same polynomial with very high probability. Specifically, if \( p \neq q \), then the check will succeed with probability only \( \frac{d}{2^d} \), where \( d \) is the degree of the polynomial [DL78, Sch80, Zip89].

In the ZK setting, \( \mathcal{V} \) can send \( r \) as a random challenge, then \( \mathcal{P} \) and \( \mathcal{V} \) can evaluate \( p(r) \) and \( q(r) \) via an arithmetic circuit that grows linearly with low constants in the length of \( x \) and \( y \). Indeed, if \( x \) and \( y \) are length \( n \), then the check requires only \( 2n - 2 \) multiplications (and \( 2n + 1 \) subtractions).

The above definitions of \( p, q \) work when \( x \) and \( y \) are vectors of individual field elements. However, we will need to consider vectors of tuples of field elements. [DdSGOTV22] points out that the above polynomial checks can be extended to vectors of tuples by taking a random linear combination of the content of each tuple. Suppose \( x \) and \( y \) are vectors of \( \ell \)-tuples, and let \( \langle \cdot, \cdot \rangle \) denote the vector inner product operation; we can redefine \( p \) and \( q \) as follows:

\[
p(X, Y) = \prod_i (X - \langle Y \cdot x[i] \rangle) \quad q(X, Y) = \prod_i (X - \langle Y \cdot y[i] \rangle)
\]

\( Y \) is a length-\( \ell \) vector. Thus, \( p \) and \( q \) now denote multivariate polynomials over \( \ell + 1 \) arguments, but otherwise [DdSGOTV22] proves that the reasoning is the same.

In the ZK setting, we can check \( p = q \) by (1) allowing \( \mathcal{V} \) to choose two random challenges \( r \in \mathbb{F} \) and \( s \in \mathbb{F}^\ell \), (2) computing \( p(r, s) - q(r, s) \) inside ZK, and (3) checking that the result is indeed zero. Note, the challenge vector \( s \) is a public value known to both \( \mathcal{P} \) and \( \mathcal{V} \), so evaluating \( p(r, s) - q(r, s) \) still requires only \( 2n - 2 \) private-input multiplication gates.

### 2.3 VOLE-based Zero Knowledge

The literature has generated many powerful ZK proof systems suited to various settings. Our ZK data structures are highly generic, relying only on typical features of ZK systems for arithmetic circuits. Still, we find it instructive to implement our approach in the context of a particular ZK system, and to demonstrate clear concrete improvement.

We implement our approach in the ZK paradigm based on vector oblivious linear evaluation (VOLE) [WYKW21, DIO21, BMRS21, YSWW21, WYX+21, BBMH+21, BBMHS22, DILO22, WYY+22]. VOLE-based ZK is an interactive ZK technique notable for its linear scaling (with low constants) in all cost metrics. This scaling makes VOLE-based ZK useful when the goal is to complete a ZK proof as fast as possible; prior work has shown that VOLE-based ZK can handle \( \approx 5 \times 10^6 \) arithmetic multiplications per second [YSWW21].

VOLE-based ZK is particularly interesting for large, complex statements, for instance those that demonstrate the existence or non-existence of software vulnerabilities in a complex code-base (see e.g. [HYDK21, LAH+22]). In the context of large, general-purpose statements, RAM becomes a crucial ingredient, as it supports programs with large data and complex branching/looping control flow.

We choose to implement in the VOLE-based ZK paradigm because VOLE-based ZK is well suited to the complex proofs supported by RAM. As an additional convenience, prior work [FKL+21] is also based on VOLE, allowing a clean comparison. Our VOLE-based implementation builds on the publicly available EMP Toolkit [WMK16].

Appendix B specifies a standard VOLE functionality.
input_ℓ: 1 → F^ℓ
\sim: (F^ℓ)^n × (F^ℓ)^n → 1
0, 1, 2, ... : F
\sim: F × F → F
\sim: F × F → F
\sim: F × F → F

Figure 1: Syntax on which we build our RAM. We consider circuits over a field F with support for permutation checks. We include gates that accept \ell inputs from \mathcal{P}, gates that check two vectors of \ell-tuples are related by a permutation, gates that add/subtract/multiply wires, and wires that hold constants. Section 2.2 discusses how \sim can be implemented.

\begin{align*}
\text{input}_\ell: 1 & \rightarrow F^\ell \\
\sim: F^\ell & \rightarrow F^\ell \\
0, 1, 2, ... : F & \\
\text{make-mem} : (F^\ell)^n & \rightarrow \text{mem-id} \\
\text{access} : \text{mem-id} \times F \times F \times F^\ell & \rightarrow F^\ell
\end{align*}

Figure 2: Our target syntax supports arithmetic gates and gates that manipulate random access memories. \text{make-mem} generates a fresh memory and \text{access} loads/stores to a memory. Figure 4 specifies semantics.

2.4 Miscellaneous Notation

We consider read-only and read/write memories. We use \( n \) to denote the number of memory elements stored and \( T \) to denote the number of accesses. In our analysis, we often assume there are a large number of accesses, i.e. that \( T = \omega(n) \). We consider memories that store tuples of \( \ell \) field elements. All operations are over a prime field \( F = \mathbb{Z}_p \) for prime \( p \); we require that \( |F| \geq 2T \), sufficient to embed memory-specific values in the field without overflow.

We index a vector \( \mathbf{x} \) at index \( i \) using bracket notation: \( \mathbf{x}[i] \). Indexing starts at zero.

We use \( \sim \) to denote an operation that checks its two vector arguments are related by a permutation.

We denote logical memory operations by \text{load} and \text{store}, and disambiguate these from physical operations \text{read} and \text{write}. Each of our logical operations requires one read and one write. Our circuits use constants \text{load} = 0 and \text{store} = 1.

We refer to the ZK prover \( \mathcal{P} \) by she/her/hers and to the ZK verifier \( \mathcal{V} \) by he/him/his.

We refer to ZK operations that multiply two private values or require \( \mathcal{P} \) that provide input as \textit{non-linear}. Operations that add/subtract two private values or scale a private value by a public value are \textit{linear}. Typically, non-linear operations are \textit{far} more expensive than linear operations. Our cost accounting precisely enumerates the number of non-linear operations only.

2.5 Models

Our goal is to UC-realize the functionality \( F_{\text{ram}}^{\text{ZK}} \) (Figure 4). This functionality allows arbitrary arithmetic circuits augmented with random access memory. We achieve this functionality given access to circuits with arithmetic gates and support for permutation proofs (Figures 1 and 3). As discussed in Section 2.2, such permutation proofs can be achieved from standard arithmetic circuits.
Functionality $F_{perm}^{ZK}$

Let $C$ be an arithmetic circuit with permutation check gates (i.e., $C$ is written in the syntax of Figure 1). $F_{perm}^{ZK}$ interacts with $P$, $V$, and the adversary $S$. Upon receiving $(prove, C, w)$ from $P$ and $(verify, C)$ from $V$:

- If when running $C(w)$ it holds that (1) the circuit outputs 0 and (2) for each internal call to $x \sim y$, $x$ and $y$ are indeed related by a permutation, then output $(C, true)$ to $S$ and $V$; else output $(C, false)$ to $S$ and $V$.

Figure 3: The functionality for Zero Knowledge proofs over arithmetic circuits with access to a permutation check gate. Figure 1 defines syntax of such circuits.

Functionality $F_{ram}^{ZK}$

Let $C$ be an arithmetic circuit with RAMs (i.e., $C$ is written in the syntax of Figure 2). $F_{ram}^{ZK}$ interacts with $P$, $V$, and the adversary $S$. Upon receiving $(prove, C, w)$ from $P$ and $(verify, C)$ from $V$, execute $C(w)$:

- For each arithmetic gate, run the gate normally.
- For each make-mem gate with input vector $x$, store $x$ as a memory, initialize a fresh memory identifier, associate the identifier with the stored memory, and place the identifier on the gate output wire.
- For each access gate with $(id, op, addr, w)$, look up the memory associated with $id$, read address $addr$, and save the result on the gate output wire. If $op = store$, then write $w$ to memory address $addr$.

If the circuit output wire holds 0, then output $(C, true)$ to $S$ and $V$; else output $(C, false)$ to $S$ and $V$.

Figure 4: Our target functionality $F_{ram}^{ZK}$ supports arithmetic circuits with random access memories. We achieve this functionality in the $F_{perm}^{ZK}$ (Figure 3) hybrid model.

with support for public coin random challenges, so our approach can be implemented on top of standard arithmetic ZK techniques.

Our functionality $F_{ram}^{ZK}$ can serve as the basis for arbitrary ZK CPU architectures, as have been explored by prior work, e.g. [BCG+13, HYDK21, FKL+21, YHKD22].

Boolean vs. arithmetic circuits. We develop ZK RAM compatible with arithmetic gates. It is well known that Boolean and arithmetic circuits trade off in efficiency. For instance, arithmetic circuits are better at multiplication; Boolean circuits are better at comparisons.

We believe that in the ZK setting and for large proofs, it is becoming clear that arithmetic circuits are a superior model of computation. Indeed, the introduction of fast arithmetic-circuit-based read-only memory (ROM) – such as the state-of-the-art ROM introduced in this work – can significantly mitigate the downside of arithmetic circuits.

As an example, consider the problem of comparing two, say, 16-bit values. This problem can be
solved by subtracting one value from the other and then looking up the difference in a ROM with 2^{17} entries. For negative indices, the ROM stores a 0; for positive indices it stores a 1. To compare larger numbers, we can first break the numbers into chunks, element-wise compare chunks, and combine the results. As long as enough comparisons are performed across the proof, this approach yields comparison operations for only a small constant number of arithmetic gates.

3 Related Work

Approach of Franzese et al. [FKL+21]. [FKL+21] constructed the first concretely efficient ZK RAM with constant overhead. Their approach works in the VOLE-based ZK setting for mixed circuits containing both arithmetic and Boolean gates. [FKL+21] leverages permutation proofs (Section 2.2). We compare our performance to that of [FKL+21] in Section 6.

The [FKL+21] approach is as follows. On each memory access to address \( addr \), \( P \) provides the loaded value \( val \) as part of her input. Then, the system records a 4-tuple of (1) the address \( addr \), (2) the time \( t \) of the access (according to a monotonically increasing clock), (3) a bit \( op \) indicating if this access is a load or a store, and (4) the value \( val \) that is loaded or stored.

Of course, on any particular load, \( P \) might cheat and provide an incorrect value. Hence, after all \( T \) accesses are completed, [FKL+21] performs a global consistency check. Note that after \( T \) accesses, the system holds a vector of \( T \) 4-tuples.

In the first step of the consistency check, \( P \) inputs a new version of this length-\( T \) vector, where this second version is sorted first by memory address, then by time. Thus, the accesses are grouped into memory addresses such that one can locally scan from left to right, checking consistency. \( P \) proves that this second vector is indeed a permutation of the first via techniques discussed in Section 2.2.

The consistency check scans the second vector from left to right, ensuring that (1) the vector is indeed sorted correctly and (2) each load operation reads a value consistent with the most recent store to the same memory address. In particular, for each index \( i \), [FKL+21] checks:

\[
((addr_i < addr_{i+1}) \lor ((addr_i = addr_{i+1}) \land (t_i < t_{i+1})))
\land ((addr_i \neq addr_{i+1}) \lor (val_i = val_{i+1}) \lor (op_{i+1} = \text{store}))
\land ((addr_i = addr_{i+1}) \lor (op_{i+1} = \text{store}))
\]

With this done, \( V \) is confident that values provided by \( P \) are correct, so the approach achieves ZK RAM.

There are several efficiency problems in this approach. First, the above consistency check requires circuits that compare addresses/values. A comparator on length-\( \ell \) values requires \( \Theta(\ell) \) gates, and RAM addresses/values have size \( \Theta(\log T) \). Hence, the number of gates is super-constant.

While [FKL+21] are still able to claim constant ZK communication overhead\(^2\), the super-constant number of gates is problematic. In particular, the technique is a poor fit for arithmetic circuits, because we must simulate each Boolean gate with arithmetic gates, incurring unacceptable cost. Many ZK systems are best suited to arithmetic circuits, so RAM over arithmetic values is arguably preferable.

---

\(^1\) [FKL+21] targets Boolean computation, but they must pack Boolean values into a larger field to achieve efficient permutation proofs.

\(^2\) The [FKL+21] notion of ‘overhead’ is the total communication cost divided by the number of bits read from RAM. Since the number of gates grows linearly in the size of RAM words, [FKL+21] can claim constant overhead.
Moreover, even [FKL+21]'s own performance suffers from the large number of gates, not in communication, but in computation. Specifically, each of [FKL+21]'s AND gate operations requires only \( \approx 1 \) bit of communication, but it also requires that both \( \mathcal{V} \) and \( \mathcal{P} \) perform operations over the field \( \mathbb{F}_{2^\kappa} \) where \( \kappa \) is a computational security parameter (e.g. 128). Together, these operations are expensive.

In our approach, we construct arithmetic ZK RAM where the total number of gates is a small constant. This allows us to build VOLE-based ZK where all costs have constant overhead, allowing us to significantly improve over [FKL+21] in terms of total wall-clock time performance. Indeed, while we achieve only a small communication improvement over [FKL+21], we improve in wall-clock time by up to 20x.

**Approach of Delpech de Saint Guilhem et al. [DdSGOTV22].** [DdSGOTV22] were the first to achieve ZK RAM whose arithmetic circuit size is linear in the number of RAM accesses \( T \). Hence, unlike [FKL+21], [DdSGOTV22] achieves constant overhead in all costs.

In short, [DdSGOTV22] leverages the same approach as [FKL+21], except that they carefully optimize comparison/equality operations. [DdSGOTV22]'s key ingredient is an elegant *bounds check*, and this check is used to implement comparison.

The [DdSGOTV22]'s bounds check proceeds as follows. Suppose the arithmetic circuit holds a length-\( T \) vector \( \mathbf{x} \in \mathbb{F}^T \), and \( \mathcal{P} \) would like prove in batch that each element of \( \mathbf{x} \) is in \( 1, ..., T \). To achieve this, [DdSGOTV22] first constructs the following vector:

\[
\mathbf{y} = \mathbf{x}[0], \mathbf{x}[1], ..., \mathbf{x}[T - 2], \mathbf{x}[T - 1], 1, 2, 3, ..., T - 1, T
\]

[DdSGOTV22] then requires that \( \mathcal{P} \) inputs a new vector \( \mathbf{z} \) and proves that \( \mathbf{z} \) is a permutation of \( \mathbf{y} \): \( \mathbf{y} \sim \mathbf{z} \). If \( \mathcal{P} \) is honest, then she will input \( \mathbf{z} \) in sorted order. With this done, \( \mathcal{P} \) can prove that all elements \( \mathbf{x}[i] \) are in the range \( 1, ..., T \) by proving that (1) the first element of \( \mathbf{z} \) is 1, (2) the last element of \( \mathbf{z} \) is \( T \), and (3) the difference between any two consecutive elements \( \mathbf{z}[i + 1] - \mathbf{z}[i] \) is either 0 or 1. It is easy to perform this check with a linear-size arithmetic circuit.

[DdSGOTV22] uses this linear-sized circuit to handle all comparisons of the [FKL+21] consistency check in batch. Equality computations can similarly be handled by a constant overhead circuit. Thus, [DdSGOTV22] removes the logarithmically-sized circuit components from [FKL+21], achieving true constant overhead in circuit size.

While [DdSGOTV22] indeed improves over [FKL+21], it retains inefficiencies of [FKL+21]'s consistency check, so the approach still requires 27 non-linear gates per access.

[DdSGOTV22] specialized their approach for the ZK paradigm based on `MPC-in-the-head' [IKOS07, dOT21], but since it is an arithmetic circuit, it can be used in other paradigms as well. Indeed, we implemented [DdSGOTV22]'s approach in the VOLE-based ZK paradigm so that we can compare to it (see Section 6).

Our ZK RAM leverages a fundamentally different approach to consistency, allowing us to achieve RAM from fewer gates.

**Other related work.** A number of other works [BSCG+13, BSCTV14, WSR+15, HK20, HK21, HYDK21] achieved proof system RAM. These works are based on permutation networks [Ben64, Wak68], requiring polylogarithmic overhead for each access.

[BBCG+18] was the first to achieve only super-constant prover cost based on shuffle proofs, e.g. [Nef01]. Their approach is not practical, as explicitly stated by the authors.
developed proofs for programs in the Map-Reduce model; they do not achieve general RAM.

We note that our ZK ROM shares some similarities with Plonk’s lookup table literature (starting from [GW20]). Similar intuition to our ZK RAM has been applied in the memory checking literature (e.g., [BEG+94, DNRV09, ZGK+18]), which focuses on providing a correct outsourced memory (without ZK). Our work ensures memory-checker-like techniques are efficiently handled in the ZK setting, where we are strictly limited in that all operations must be handled by black-box field arithmetic. Achieving this efficiently requires careful handling of ROM/RAM.

4 Technical Overview

4.1 Read-Only Memory

To begin, let us elide writes, and suppose that we wish to implement read-only memory (ROM). Namely, $P$ and $V$ agree on a length-$n$ vector of wires $x$, and we encode $x$ as a ROM data structure. Now, given an address wire $i$, the circuit can query $\text{lookup}(i)$, which returns a wire holding $x[i]$. We present an efficient instantiation of $\text{lookup}$.

As a first insight, observe that each call to $\text{lookup}$ can leverage honest $P$ as an oracle; honest $P$ knows the cleartext value of $x$ and $i$, and hence she can simply provide $x[i]$ as input. Of course, $P$ might be cheating, so we must check that each such input is indeed correct.

Our ROM data structure handles all such checks in batch by maintaining two vectors of tuples, $\text{reads}$ and $\text{writes}$. On each call to $\text{lookup}$, we append one element to $\text{reads}$ and one to $\text{writes}$. Our key insight is that with minimal added metadata, we can ensure that each call to $\text{lookup}$ is correct if and only if $\text{reads}$ is a permutation of $\text{writes}$.

Each element of $\text{reads}$ (resp. $\text{writes}$) is a three-tuple of (1) the element’s address, (2) the element’s value, and (3) a metadata value called the version. Versions act as address-specific counters, and each call to $\text{lookup}$ increments a version.

To set up our ROM data structure, for each address $i \in [n]$ we append to $\text{writes}$ the following tuple:

$$(i, x[i], 0)$$

I.e., we initialize each element with version 0. Then, on each call to $\text{lookup}(i)$, $P$ provides two inputs: (1) the value $x[i]$ and (2) the latest version $v$ associated with address $i$. We then update both $\text{reads}$ and $\text{writes}$ by appending the old version $(i, x[i], v)$ to $\text{reads}$ and a fresh version $(i, x[i], v+1)$ to $\text{writes}$.

When we are finished with ROM, we perform a simple cleanup step that we call a teardown. For each address $i \in [n]$, honest $P$ inputs the latest version $v$, and we append $(i, x[i], v)$ to $\text{reads}$. Finally, $P$ demonstrates that the reads and writes are related by a permutation: $\text{reads} \sim \text{writes}$. Perhaps surprisingly, this simple approach is already correct and sound.

Correctness. Correctness of our ROM holds by the following key invariant:

**Invariant 1** (ROM Invariant). On each call to $\text{lookup}$ and for each address $i$, the vector $\text{writes}$ contains (at least) one tuple of the form $(i, \text{val}, v)$ that does not appear in $\text{reads}$.

The setup phase establishes Invariant 1, because it writes version 0 of each element. Then, on each call to $\text{lookup}$, honest $P$ propagates Invariant 1 by reading the latest version $v$ of $x[i]$,
Soundness. The key insight behind the soundness of our ROM is that even a malicious $\mathcal{P}$ must read each version of an element that is written, or else reads will not be a permutation of writes. Moreover, on each call to $\texttt{lookup}$, the circuit structure ensures that for whatever version $v$ $\mathcal{P}$ chooses to read, a fresh version $v + 1$ is written.

The above two facts bind $\mathcal{P}$ into building per-element chains of reads and writes (see Figure 5), where (1) each read is made to a version written at some different point in time, (2) the initial link of the chain is written at setup, and (3) the final link is read at teardown. Because each element $x[i]$ written at setup is stored on circuit wires (i.e., is not a prover input), and because on each $\texttt{lookup}$ we write back the same element that was read, it must be the case that all reads made to the same address $i$ match the initial value $x[i]$.

Interestingly, our ROM does allow malicious $\mathcal{P}$ to “read from the future”. Namely, on a particular call to $\texttt{lookup}(i)$, malicious $\mathcal{P}$ can read a version of $x[i]$ that is only written at a later call to $\texttt{lookup}(i)$. This remains sound because the value $x[i]$ does not change over the lifetime of the ROM. The crucial point is that it is not possible for $\mathcal{P}$ to form cycles, where two calls to $\texttt{lookup}$ each read a version written by the other call. It is impossible to convincingly construct such a cycle because versions monotonically increases on each call to $\texttt{lookup}$, so $\mathcal{P}$ cannot arrange two versions such that each version is greater than the other.

The ability for malicious $\mathcal{P}$ to read from the future is the key distinction between ROM and full-fledged read/write RAM: in full RAM we must ensure that each of $\mathcal{P}$’s read values comes from the past.

Remark 1 (Looking up an illegal address). If a circuit calls $\texttt{lookup}(i)$ where $i$ is not written at setup, then the proof will fail. This is because $\mathcal{P}$ cannot construct a chain, as there is no first link...
written at setup. This fact will be useful later.

Generalizing to tuples. So far, the ROM stores individual wire values. It is easy to generalize the ROM such that each address stores an $\ell$-tuple of wires for arbitrary $\ell$. This simply requires changing the number of inputs provided by $\mathcal{P}$.

Generalizing to read-only key-value stores. So far, our ROM stores elements at addresses $0, 1, \ldots, n-1$. It is easy to generalize to a key-value store where the key space is an arbitrary set. In particular, at setup we initialize the key-value store from a list of pairs $(i, x[i])$. So long as each address $i$ is unique, this generalization incurs no extra cost. This is useful when using key-value stores to encode sets (Section 4.2).

Gate cost. Consider a ROM of size $n$ where $\text{lookup}$ is called $T$ times. For a ROM storing $\ell$-tuples, where $\ell$ is an arbitrary constant, each call to $\text{lookup}$ requires $\ell + 1$ inputs from $\mathcal{P}$ (the $\ell$ entries of the tuple and the version), for a total of $T \cdot (\ell + 1)$ input gates. The teardown phase similarly requires that $\mathcal{P}$ reads each of the $n$ slots, requiring $n \cdot (\ell + 1)$ additional input gates. Finally, the permutation proof reads $\sim$ writes inspects vectors of length $n + T$; this can be implemented using two fan-in $n + T$ multiplication gates (see Section 2.2).

In sum, the full construction requires:

- $(n + T)(\ell + 1)$ input gates.
- Two fan-in $n + T$ multiplication gates.
- $O(n + T)$ linear gates.

4.2 Sets with Membership Queries

As mentioned in Section 4.1, the key difference between ROM and full-fledged RAM is that in RAM we must check that each read value was written at some time in the past. As we will see, we achieve this by checking that on each access a particular timing value is in a public set $\{1, \ldots, T\}$, where $T$ is the total number of RAM accesses.

We observe that our read-only key-value store (RO-KVS, Section 4.1) is well suited to this set membership proof. In particular, consider an RO-KVS where we set the size of tuples to $\ell = 0$. We initialize an RO-KVS with addresses $1, \ldots, T$. Then, to check that a particular value $t$ is a member of a set, we call $\text{lookup}(t)$. This call does not return any data, but if $t$ is not in the set $\{1, \ldots, T\}$, then the proof will fail, as was observed in Remark 1. Thus this call to $\text{lookup}$ serves as a proof that $t$ is in the set.

Gate cost. The cost of our set structure is inherited from the cost of ROM. For a set with $T$ values and $T$ membership queries (these parameters are used in our RAM), total cost is:

- $2T$ input gates.
- Two fan-in $2T$ multiplication gates.
- $O(T)$ linear gates.
4.3 Read/Write Memory

Consider a random access memory (RAM) equipped with a single access instruction. Each call to access accepts as input three arguments:

- \( op \in \{0, 1\} \) is 0 on a load and 1 on a store.
- \( i \in \mathbb{F} \) is an address.
- \( w \) is the value to store, if this operation is indeed a store.

Each RAM access both reads and writes data, whether the operation is a load or a store. On a store, we read the old data and write the new data; on a load, we read the old data and write back a fresh copy of the same data. Just like our ROM, our RAM maintains two vectors reads and writes, and on each access we update each vector. Also like our ROM, our RAM checks that reads are a permutation of writes: \( \text{reads} \sim \text{writes} \).

As mentioned in Section 4.1, the main challenge in RAM as compared to ROM is that we must prevent malicious \( \mathcal{P} \) from reading a value written in the future. Our approach here is to maintain a monotonically increasing public value clock which is set to 0 at setup and then incremented on each access.

When we write a value \( val \) to address \( i \), we append the following tuple to writes:

\[(i, val, \text{clock})\]

I.e., we mark each write with the time it occurred. To read a value and ensure \( \text{reads} \sim \text{writes} \), we must also mark each read value with the time at which that value was written. Accordingly, each call to access at address \( i \) requests that \( \mathcal{P} \) inputs two values: (1) the value \( val \) that was last written to address \( i \) and (2) the time \( t \) when the write occurred.
Of course, $P$ might be malicious, so we must check that the input $(val, t)$ is consistent with the semantics of read/write RAM. The permutation proof $\text{reads} \sim \text{writes}$ ensures that each such triple $(i, val, t)$ is indeed written at some point in time. The remaining challenge is to ensure that this point in time is in the past, which can be achieved by checking that $\text{clock} - t$ is a strictly positive value. Said another way, we can check that $\text{clock} - t$ is in the set $\{1, \ldots, T\}$, where $T$ is the total number of RAM accesses. Each such check is achieved by our set data structure (Section 4.2).

This simple set membership proof is sufficient, and we do not need to explicitly prove that each read corresponds to the most recent write. Indeed, this stronger property (each read comes from the latest write) is implied by the local checks combined with the permutation proof $\text{reads} \sim \text{writes}$.

To see that this is true, observe that our RAM maintains the following key invariant:

**Invariant 2 (RAM Invariant).** Before each call to access and for each address $i$, writes contains exactly one tuple of the form $(i, val, t)$ that does not appear in reads. Moreover, the value $t$ is highest amongst all such tuples in writes; i.e., this tuple records the most recent write to address $i$.

When we set up RAM with initial content $x$, we append the following write for each address $i$: $(i, x[i], 0)$. Since we write each address once, our setup trivially establishes Invariant 2.

On each call to access, $P$ is forced to input values $(val, t)$ such that $\text{clock} - t$ is positive. By Invariant 2 there is only one such choice that will pass the permutation proof, and this choice must be the most recent version of address $i$, so this access is sound. Moreover, this access propagates the invariant because we increment the clock and write a fresh copy of address $i$; see also Figure 6.

Once all $T$ accesses are complete, note that we have written $n + T$ values, but we have only read $T$ values. We allow $P$ to complete the read history in a teardown step, where we read (without writing back) each RAM address. In particular, for each address $i$, $P$ again inputs a pair $(val, t)$, and we append $(i, val, t)$ to reads. Note that on this step and by Invariant 2, we do not need to check that $\text{clock} - t$ is positive; it must be, or else the permutation proof will fail.

**Generalizing to tuples.** As in our ROM, it is straightforward to generalize the RAM such that each address stores an $\ell$-tuple of wires for arbitrary constant $\ell$. This simply requires changing the number of inputs provided by $P$.

**Gate cost.** Our RAM has four relevant costs:

- We maintain the set $\{1, \ldots, T\}$ (Section 4.2) and perform one membership proof per access.
- We use a permutation proof (Section 2.2) on two length-$(T + n)$ vectors: $\text{reads} \sim \text{writes}$.
- $P$ inputs two values $(val, t)$ on each of the $T$ accesses, and for each of $n$ addresses at teardown.
- We include a single multiplexer that disambiguates load operations from store operations. In particular, after reading old from RAM, we compute:

  $$\text{new} \leftarrow \text{old} + \text{op} \cdot (w - \text{old})$$

  If the operation type is public, this is not needed. For completeness, we assume operation types are private.

In total, the RAM incurs the following gate cost:
• $4T + 2n$ input gates.

• Two fan-in $2T$ multiplications (from the set) and two fan-in $T + n$ multiplications (from the permutation proof). We optimize one of the fan-in $2T$ multiplications to be only fan-in $T$; see Section 5.2.

• $T$ fan-in two multiplications (from multiplexers).

• $O(n + T)$ miscellaneous linear operations.

When we assume that the number of accesses $T$ is significantly greater than the RAM size $n$, we obtain the per-access costs listed in our contribution.

5 Approach

5.1 Formal Protocol and Security

![Code](image)

Figure 7: Our ZK read-only key-value store holds $\ell$-word values at each key. We highlight uses of non-linear gates.

We formalize our ZK RAM as a protocol in the $F_{ZK}^{perm}$-hybrid model. This protocol formalizes ideas explained in Section 4. Our protocol is defined with respect to Figures 7 to 9.

Protocol 1. $P$ and $V$ agree on an arithmetic circuit with RAM access $C$; i.e., $C$ is written in the syntax of Figure 2. $P$ holds as input a witness $w$. $P$ and $V$ compile $C$ to an arithmetic circuit with permutations (Figure 1) in the following manner:

• For each input gate, addition gate, subtraction gate, multiplication gate, and constant wire, they trivially compile the circuit element.
type set ::= ro-kvs₀

setup-set(content : F*) → set {
  return setup-ro-kvs₀(content)
}

prove-member(s : set, val : F) {
  lookup(s, val)
}

teardown-set(s : set) {
  teardown-ro-kvs(s)
}

Figure 8: Our ZK set data structure is a specialization of our read-only key-value store (Figure 7) where each key holds 0 words. An element is considered a member of the set if it is a key in the key-value store. We highlight uses of non-linear gates.

- For each make-mem gate, the parties invoke setup (Figure 9) which produces a circuit-based RAM data structure. P locally maintains a cleartext copy of this RAM (and its internal set structure), allowing P to track which value is stored at each address.

- For each access gate, the parties invoke the access procedure of Figure 9. P extends her witness w by using her local copy of the RAM to recall the last time t when the queried address was accessed and what value val was stored there (and to find the set version of clock − t) so that these values can be passed to input gates. P updates her local copy of RAM.

- Once all gates are compiled, the parties invoke teardown on each RAM data structure created by a call to setup. P uses her local copy of RAM (and its internal set) to appropriately extend her witness w.

P and V thus agree on a circuit C' with arithmetic gates and permutation proofs. Next, (1) P sends (prove, C', w) to F^perm ZK, (2) V sends (verify, C') to F^perm ZK, (3) P outputs ⊥, and (4) V outputs whatever he receives from F^perm ZK.

Our key security property is that Protocol 1 achieves the Figure 4 functionality, allowing us to securely implement ZK proofs with RAM.

Theorem 1. Protocol 1 UC-realizes F^ram ZK (Figures 2 and 4) in the F^perm ZK-hybrid model (Figures 1 and 3) with no soundness error.

Appendix C.3 provides a detailed proof of Theorem 1. For now, we sketch the argument, focusing on soundness.

Proof Sketch. By construction of a simulator S.

Our protocol can be viewed as a circuit compiler that maps arithmetic circuits with RAM (Figures 2 and 4) to arithmetic circuits with permutations (Figures 1 and 3). Accordingly, S's only task is to tediously convert between RAM circuits and permutation circuits, and to collect P's witness w before sending it to the F^ram ZK functionality.

However, we still must demonstrate that S is a valid simulator, and that the UC environment cannot distinguish the simulation from the real world protocol. The challenge in showing this is that our compilation generates non-trivial input gates, allowing P extra degrees of freedom in choosing
type RAM_\ell\{  
reads : record_\ell\{  
value : F  
time : F  
\}\  
writes : record_\ell\{  
\}\  
valid-diffs : set  
clock : F  
size : N  
\}\  
figure 9: Our ZK read/write memory. Each memory slot holds an \ell-tuple of field elements. The memory is defined in terms of our set data structure (Figure 8). We define three procedures: setup, access, and teardown. Valid usage consists of one call to setup, an arbitrary (polynomial) number of calls to access, and one call to teardown. We highlight uses of non-linear gates.

a convincing witness. Therefore, we must argue that our construction is sound. All compiled gates in Figure 2 are trivially sound, save for make-mem and access gates.

In short, these gates are sound because our RAM construction includes a permutation proof on its reads and writes, and because on each access we check that \( P \)'s chosen time \( t \) is less than \( \text{clock} \) (this latter property is checked by a set membership proof). As per discussion near Invariant 2, these local checks enforce a global ordering of reads/writes on each address, on each access forcing \( P \) to input the most recent value written to the same address.

It remains to show that the underlying set data structure is sound. As discussed in Section 4.1, our set (as inherited from our ROM) on each access forces \( P \) to input a version. Because we monotonically increase versions and because we use a permutation proof on set reads/writes, we force \( P \) to construct a linked list of read/write pairs, connecting each read value to the setup-time write on the same address.

Together, this means that \( P \)'s extra inputs must respect the semantics of access gates, so the
protocol is sound.

Prior work on VOLE-based ZK gave powerful constant round protocols for proving arithmetic statements. Such prior works, together with the permutation proof method discussed in Section 2.2 and an optimization discussed in Section 5.2, imply the following (see Appendix C.2 for more details):

Lemma 1. There exists a 5-round protocol in the VOLE-hybrid model that UC-realizes $F_{ZK}^{perm}$. For each use of input and of $\cdot \cdot \cdot$, $P$ and $V$ use 1 VOLE correlation; for each use of $\cdot \sim \cdot$ on two length-$n$ vectors, $P$ and $V$ each evaluate a linear number of field operations and consume $\frac{2m}{\epsilon} + o(1)$ VOLE correlations, for any constant $\epsilon \geq 2$. The protocol achieves soundness error $O(\frac{n+m}{|F|})$, where $n$ is the size of the largest vector passed to $\cdot \sim \cdot$ and $m$ is the number of multiplications.

Together, Theorem 1 and Lemma 1 allow us to instantiate ZK RAM from VOLE-based ZK:

Corollary 1 (RAM from VOLE). There exists a 5-round protocol in the VOLE-hybrid model that UC-realizes $F_{ZK}^{ram}$. For each call to access on a size-$n$ memory that is accessed $T = \omega(n)$ times, $P$ and $V$ each evaluate a constant number of field operations and consume $5 + \frac{5}{\epsilon} + o(1)$ VOLE correlations, for any constant $\epsilon \geq 2$. The protocol achieves soundness error $O(\frac{T+m}{|F|})$, where $T$ is the largest number of accesses used by any memory and $m$ is the number of multiplications.

Formalizing ROM (including our set data structure). For simplicity and because RAM is more interesting than ROM, we chose to not formalize ROM as part of our top level functionality (Figure 4). Such a formalism can be derived from our RAM formalism by substituting RAM-related gates to initialize ROM and access ROM, and correspondingly modifying Protocol 1 according to Figure 7. Soundness of ROM gates follows from our proof of Theorem 1. We include security games crucial for the soundness proof of ROM in Appendix C.1.

5.2 Optimizations

Accelerating permutation proofs. The main cost of our protocol comes from permutation proofs, and permutation proofs can be achieved by high fan-in multiplications (Section 2.2). We note two important available optimizations.

The first optimization is generally applicable for any ZK system and saves $T$ multiplications. Consider the permutation proof performed on the publicly known set $\{1, \ldots, T\}$ (Section 4.2). Here, the first $T$ writes are publicly known to be $(1,0), (2,0), \ldots, (T-1,0), (T,0)$. Hence, in the permutation proof, once $V$ chooses a random challenge, $P$ and $V$ can locally compute the product of the first $T$ components of writes. Again, this saves $T$ ZK multiplications.

The second optimization leverages a VOLE-specific trick to reduce cost of high fan-in multiplications. QuickSilver [YSWW21] supports polynomial proofs. Namely, [YSWW21] can efficiently prove a batch sub-statements of the following form:

$$p(x_0) = 0, \ldots, p(x_{n-1}) = 0$$

Here, $p$ is a degree-$d$ polynomial over arithmetic wires. The batch of $n$ sub-statements requires only $d$ VOLE correlations (VOLE correlations are the primary cost in VOLE-based ZK). This trick reduces VOLE correlations, but can increase computation, as $P$’s compute scales with $O(d^2)$. 17
Polynomial proofs can optimize fan-in-\(n\) multiplications as follows. Consider a fan-in-\(\epsilon\) multiplication for any constant \(\epsilon \geq 2\). We can implement such a multiplication by allowing \(\mathcal{P}\) to input a value \(prod\), and then defining a degree-\(\epsilon\) polynomial that multiplies the \(\epsilon\) values and subtracts \(prod\). This is a degree \(\epsilon\) polynomial, and we can use [YSWW21] to prove it equals zero (if it does, \(\mathcal{P}\) input \(prod\) correctly). It is straightforward to use \(\lceil \frac{n}{\epsilon-1} \rceil\) fan-in-\(\epsilon\) multiplications to implement a fan-in-\(n\) multiplication, so we can use \(\lceil \frac{n}{\epsilon-1} \rceil\) polynomial proofs to implement a fan-in-\(n\) multiplication.

Prior work [FKL+21] also used this trick, but their corresponding improvement was limited, as their cost is dominated by comparisons, not by permutation proofs. In our approach, almost all multiplication cost comes from high fan-in multiplications, so we enjoy substantial improvement.

By choosing arbitrary constant \(\epsilon \geq 2\), we reduce the number of VOLE correlations from \(\approx 10T\) to \(\approx (5+\frac{5}{\epsilon-1})T\). This reduces the communication used by \(\mathcal{P}\) and \(\mathcal{V}\), and it also reduces the amount of computation needed to construct the correlations. However, as stated above, \(\mathcal{P}\)’s proof computation increases as we increase \(\epsilon\), so we limit \(\epsilon\) to constant values to avoid asymptotic problems. In Section 6 we find it is best to set \(\epsilon\) between 8 and 32, depending on the network.

To our knowledge, this polynomial proof optimization is currently specific to VOLE-based ZK. However, it is possible similar optimization will emerge in other ZK paradigms, particularly because ZK RAM provides compelling motivation.

**Other optimizations.** Appendix D presents other optimizations that are available in our RAM, but that we did not implement or account for in our cost. In particular, our underlying set \(\{1, \ldots, T\}\) can be re-used in other parts of a large proof, and it is possible to periodically refresh RAM to slightly increase performance or soundness.

## 6 Evaluation

**Our implementation.** We implemented our ROM, our set, and our RAM ZK data structures from VOLE in C++ available at https://github.com/gconeice/improved-zk-ram. We use the learning-parity-with-noise-based VOLE protocol of [WYKW21] which is implemented as part of the EMP Toolkit [WMK16]. While our VOLE-based approach will extend to any large field, we use the field \(\mathbb{F}_{2^{61}-1}\) because this is the only arithmetic field currently implemented by EMP.

**[FKL+21] and [DdSGOTV22] implementation.** We compare our RAM to that of [FKL+21] and of [DdSGOTV22]. We also compare our ROM to that of [FKL+21]. [FKL+21]’s RAM/ROM implementation is publicly available as part of the EMP Toolkit [WMK16]. [DdSGOTV22]’s RAM implementation is not publicly available. We implemented a VOLE-based version of their RAM\(^3\) in C++. Other ZK RAMs do not have constant overhead per access.\(^4\)

As discussed in Section 3, [DdSGOTV22]’s RAM leverages permutation proofs. Recall, VOLE-based ZK offer optimizations for permutation proofs (see Section 5.2). We also note that several gates in [DdSGOTV22]’s protocol can be further optimized in VOLE-based ZK. In particular, these gates require checking that the product of two wires is some constant; such checks can be done in VOLE-based ZK without communication. For fair comparison, we incorporate all such VOLE-based optimizations in our implementation of [DdSGOTV22].

\(^3\)[DdSGOTV22]’s RAM was designed for the MPC-in-the-head paradigm, but, like our RAM, it is a pure arithmetic circuit with random challenges, so it can also be implemented for VOLE-ZK.

\(^4\)[FKL+21] reported clear concrete improvements over earlier ZK RAMs.
Experimental setup. We ran all three RAMs on a benchmark where we instantiate a RAM of sizes $n$ ranging from $2^{11}$ to $2^{20}$ and perform $T = 2^{23}$ accesses. We run our ROM, [FKL+21]'s ROM, and our set data structure with the same $n$ and $T$.

In all cases, $\mathcal{P}$ and $\mathcal{V}$ run single threaded on a separate Amazon EC2 m5.2xlarge machine\textsuperscript{5}. We measure total (and per-access) communication in bytes and per-access runtime in microseconds. For the latter, we experiment with different network bandwidth settings, varying between a WAN-like 25Mbps connection and a LAN-like 1Gbps connection.

<table>
<thead>
<tr>
<th>$n = 2^{20}$</th>
<th>Bandwidth</th>
<th>$\epsilon$</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td>2</td>
</tr>
<tr>
<td></td>
<td>100 Mbps</td>
<td>2.39</td>
</tr>
<tr>
<td></td>
<td>500 Mbps</td>
<td>0.80</td>
</tr>
<tr>
<td></td>
<td>1 Gbps</td>
<td>0.66</td>
</tr>
<tr>
<td>Comm. (byte)</td>
<td></td>
<td>27.1</td>
</tr>
<tr>
<td>ROM</td>
<td>25 Mbps</td>
<td>12.56</td>
</tr>
<tr>
<td></td>
<td>100 Mbps</td>
<td>3.37</td>
</tr>
<tr>
<td></td>
<td>500 Mbps</td>
<td>1.10</td>
</tr>
<tr>
<td></td>
<td>1 Gbps</td>
<td>0.92</td>
</tr>
<tr>
<td>Comm. (byte)</td>
<td></td>
<td>38.4</td>
</tr>
<tr>
<td></td>
<td>100 Mbps</td>
<td>7.80</td>
</tr>
<tr>
<td></td>
<td>500 Mbps</td>
<td>2.60</td>
</tr>
<tr>
<td></td>
<td>1 Gbps</td>
<td>2.14</td>
</tr>
<tr>
<td>Comm. (byte)</td>
<td></td>
<td>88.3</td>
</tr>
</tbody>
</table>

Figure 10: Per-access runtime ($\mu$s) and communication (bytes) of our data structures as a function of $\epsilon$, the degree of polynomials used in our handling of high fan-in multiplications; see Section 5.2. As $\epsilon$ increases, we use fewer VOLE correlations, but $\mathcal{P}$ computes more field operations.

Parameter $\epsilon$ for accelerating permutation proofs. Our RAM, ROM, and set data structures (and our implemented [DdSGOTV22]'s RAM) all benefit from VOLE-based polynomial proofs used to optimize to high-fan-in multiplications (see Section 5.2). In particular, recall that parameter $\epsilon$ trades off between number of required VOLE correlations and $\mathcal{P}$ computation. Figure 10 tabulates the impact of $\epsilon$ on performance (we set $n = 2^{20}$ and $T = 2^{23}$).

For simplicity of implementation, we organize our high fan-in-$n$ multiplications as trees of multiplications, and we only use fan-in-$\epsilon$ gates at the widest tree level. The remaining multiplications are achieved with fan-in-two gates. This means our high fan-in gates use $\approx \frac{2n}{\epsilon}$ VOLEs, rather than $\left\lceil \frac{n}{\epsilon-1} \right\rceil$. In practice, the change in performance is not noticeable.

In subsequent experiments, we set $\epsilon = 16$. Increasing $\epsilon$ beyond this point yields greatly diminished communication improvement, as input gates dominate cost and are not improved by increasing $\epsilon$. 

\textsuperscript{5}Intel Xeon Platinum 8175 CPU @ 3.10GHz, 8 vCPUs, 32GiB Memory, 10Gbps Network
Our RAM performance. Figure 11 plots per-access runtime of our RAM in microseconds. On a slow 25Mbps WAN-like connection, our RAM requires $\sim 15\mu s$ per access; On a faster 100Mbps WAN-like connection, our RAM requires $< 5\mu s$ per access; On a 1Gbps LAN-like connection, our RAM requires only $\sim 1.5\mu s$ per access, achieving access at $\approx 600$ KHz. As $n$ ranges from $2^{11}$ to $2^{20}$, the per-access communication increases only from 47 to 50 bytes; runtime remains essentially unchanged (caching begins to impact performance at high $n$). We defer the detailed data to Appendix E.

Our ROM and set performance. On a 25Mbps WAN-like connection, our ROM (resp. set) requires $< 8\mu s$ (resp. $< 4\mu s$) per access; on a 1Gbps LAN-like connection, our ROM (resp. set) requires $\sim 1\mu s$ (resp. $< 1\mu s$) per access. As $n$ ranges from $2^{11}$ to $2^{20}$, the per-access communication is 19-22 (resp. 10-12) bytes; runtime remains essentially unchanged (caching effects begin to impact performance at high $n$). We defer plots and detailed data to Appendix E.

Our RAM/ROM vs. [FKL+21]’s RAM/ROM. Figures 12 and 13 plot our RAM/ROM’s speedup over [FKL+21]’s RAM/ROM. [FKL+21]’s implementation is over Boolean circuits with $\mathbb{Z}_{2^{32}}$ ring algebraic structure. Our ROM improves over [FKL+21]’s ROM by 3-34×, depending on bandwidth. Our RAM improves over [FKL+21]’s RAM by 2-20×. Figure 14 tabulates communication required by our RAM/ROM as compared to [FKL+21].

Our speedup comes primarily from improved computation, resulting from a greatly reduced number of gates. Note the significant communication reduction from $\mathcal{V}$ to $\mathcal{P}$, reflecting our reduction in the number of required VOLE correlations.
Figure 12: Speedup of our RAM over [FKL+21]'s RAM.

Figure 13: Our ROM's speedup over [FKL+21]'s ROM.

Figure 15: Our RAM's speedup over [DdSGOTV22]'s RAM (we optimize [DdSGOTV22]'s RAM).
<table>
<thead>
<tr>
<th>Direction</th>
<th>Benchmark</th>
<th>log n</th>
</tr>
</thead>
<tbody>
<tr>
<td>( P \rightarrow V )</td>
<td>([FKL^{+}21])</td>
<td>12 14 16 18 20</td>
</tr>
<tr>
<td>Ours</td>
<td></td>
<td>28.4 29.2 30.1 31.5 34.7</td>
</tr>
<tr>
<td>( V \rightarrow P )</td>
<td>([FKL^{+}21])</td>
<td>12.6 12.9 13.4 14.0 15.3</td>
</tr>
<tr>
<td>Ours</td>
<td></td>
<td>0.7 0.7 0.7 0.7 0.9</td>
</tr>
<tr>
<td>Total</td>
<td>([FKL^{+}21])</td>
<td>41.0 42.1 43.5 45.6 50.1</td>
</tr>
<tr>
<td>Ours</td>
<td></td>
<td>19.1 19.2 19.3 19.9 22.7</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Direction</th>
<th>Benchmark</th>
<th>log n</th>
</tr>
</thead>
<tbody>
<tr>
<td>( P \rightarrow V )</td>
<td>([FKL^{+}21])</td>
<td>32.3 33.1 34.1 35.5 39.2</td>
</tr>
<tr>
<td>Ours</td>
<td></td>
<td>45.7 45.8 45.9 46.5 49.0</td>
</tr>
<tr>
<td>( V \rightarrow P )</td>
<td>([FKL^{+}21])</td>
<td>14.5 14.8 15.2 15.9 17.5</td>
</tr>
<tr>
<td>Ours</td>
<td></td>
<td>1.3 1.3 1.3 1.3 1.3</td>
</tr>
<tr>
<td>Total</td>
<td>([FKL^{+}21])</td>
<td>46.9 48.0 49.3 51.4 56.8</td>
</tr>
<tr>
<td>Ours</td>
<td></td>
<td>47.1 47.1 47.3 47.9 50.3</td>
</tr>
</tbody>
</table>

Figure 14: Per-access communication (bytes) used by our ROM (resp. RAM) versus \([FKL^{+}21]\)’s ROM (resp. RAM).

**Our RAM vs. \([DdSGOTV22]\)’s RAM.** Figure 15 plots our RAM speedup over our optimized implementation of \([DdSGOTV22]\)’s RAM.

Even though we heavily optimize \([DdSGOTV22]\)’s RAM, our RAM still improves per-access runtime by 2-2.9×. As \( n \) ranges from \( 2^{11} \) to \( 2^{20} \), \([DdSGOTV22]\)’s per-access communication goes from 129 to 145 bytes, which is \( \approx 3 \times \) more than our RAM.

As one additional data-point, we disabled all VOLE-based optimizations for both our RAM and our implementation of \([DdSGOTV22]\), then ran with \( n = 2^{20} \). Here, our RAM requires 88 bytes per access while \([DdSGOTV22]\) requires 242 bytes. These numbers reflect our raw advantage in total number of non-linear gates.

![Table showing ROM and RAM runtime cost breakdown for \( n = 2^{20} \).](image)

**Figure 16: Runtime cost breakdown of our ROM/RAM (\( \mu s \)).**

**Microbenchmarks.** Figure 16 breaks down the per-access runtime of our RAM and ROM. The main components include setup, access, and teardown (see Figures 7 and 9). Access clearly dominates cost, and setup is by far the cheapest step.

**More evaluation.** Appendix E includes further evaluation, including tabulated data underlying our plots.
Acknowledgements

This research was developed with support of funding under NSF grant CNS-2246353 and CNS-2246354.

References


SUPPLEMENTARY MATERIAL

Appendix A  Universal Composability

The Universal Composability (UC) framework [Can01] models protocols as interactions between probabilistic polynomial-time (PPT) Interactive Turing Machines (ITMs). We use UC to formalize security of our protocols in the presence of a malicious, static adversary.

In UC, we compare real protocol executions with ideal executions. An execution of protocol Π with hybrid functionality G is an interaction between parties (ITMs) and the hybrid functionality G. An ideal execution is an interaction between dummy parties that simply forward messages to the ideal functionality F. Security of the real world protocol is defined with respect to an additional ITM called the environment E. E provides inputs to parties and can corrupt a subset of parties by replacing their ITMs with arbitrary ITMs A. The security of protocol Π is ensured when no E can distinguish the protocol from the ideal execution. In this case, Π is “as secure as the ideal”.

Let Π be a protocol in the G-hybrid model. The output of E interacting with Π in the presence of an adversary A on input 1λ and auxiliary input z is denoted EXECGΠ,A,E(1λ, z). Consider another (trivial) protocol with ideal functionality F, dummy parties, and a simulator S. The output of an environment E interacting with this trivial protocol in the presence of a simulator S on input 1λ and auxiliary input z is denoted as IDEALF,S,E(1λ, z).

Definition 1. Protocol Π UC-emulates an ideal functionality F in the G-hybrid model when for any PPT adversary A there exists a simulator S such that for any environment E:

\[ \{ \text{EXEC}_{G, A, E}(1^\lambda, z) \}_{\lambda \in \mathbb{N}, \quad z \in \{0, 1\}^*} \approx \{ \text{IDEAL}_{F, S, E}(1^\lambda, z) \}_{\lambda \in \mathbb{N}, \quad z \in \{0, 1\}^*} \]

where “≈” denotes computational indistinguishability of distribution ensembles, see [Gol09].

Appendix B  VOLE Functionality

Figure 17 presents the standard Vector Oblivious Linear Evaluation (VOLE) functionality FVOLE. FVOLE allows P and V to construct two vectors of random field elements, entries of which are correlated by a fixed field element ∆. ZK proofs can be efficiently constructed in the VOLE hybrid model, see e.g. [YSWW21, DIO21]. We implement our ZK RAM from VOLE.

Appendix C  Proof of Security

C.1 Soundness Lemmas

Protocol 1 requires that P input extra field elements. The soundness of our protocol is based on the fact that even a malicious P cannot input values that disagree with the semantics of ROM/RAM. This section formalizes P’s inability to cheat as security games and lemmas. Our main proof of security relies on these lemmas.

Lemma 2. Let F be a prime field, and let m a positive integer s.t. m < |F|. Let \( P \triangleq \{ t_1, \ldots, t_m \} \) be an ordered set of any m field elements. Let \( Q \triangleq \{ t_1 + 1, \ldots, t_m + 1 \} \) be an ordered set. P and Q cannot form a permutation: \( P \not\sim Q \).
Functionality $F_{\text{VOLE}}$

Consider a base field $\mathbb{F}$. $F_{\text{VOLE}}$ interacts with $P$, $V$, and the adversary $A$ as follows:

**Initialize.** Upon receiving (init) from $P$ and $V$, if $V$ is honest, sample $\Delta \in \mathbb{F}$, else receive $\Delta$ from $A$. Store $\Delta$ and send it to $V$. Ignore subsequent (init).

**Extend.** Upon receiving (extend, $n$) from $P$ and $V$, do the following:

- If $V$ is honest, sample $v \in \mathbb{F}^n$, else get $v \in \mathbb{F}^n$ from $A$.
- If $P$ is honest, sample $u \in \mathbb{F}^n$ and compute $w \triangleq v - u \cdot \Delta \in \mathbb{F}^n$, else receive $u \in \mathbb{F}^n$ and $w \in \mathbb{F}^n$ from $A$ and compute $v \triangleq w + u \cdot \Delta \in \mathbb{F}^n$.
- Send $(u, w)$ to $P$ and $v$ to $V$.

Figure 17: The VOLE correlation functionality.

Proof. By induction on the entries of $P$ and $Q$. Consider $t_1$ in $P$. By construction, we ensure $t_1 + 1$ is in $Q$. If $P \sim Q$, then $t_1 + 1$ is also in $P$. By repeating this argument, $P$ must have $t_1, t_1 + 1, t_1 + 2, \ldots, t_1 + m - 1$. Accordingly, $Q$ must have $t_1 + 1, t_1 + 2, t_1 + 3, \ldots, t_1 + m$. However, since $m < |\mathbb{F}|$, $t_1 \notin Q$, contradicting $P \sim Q$.

Game 1 (Single Element ROM). Let $\mathbb{F}$ be a prime field, and let $T \in \mathbb{N}$ s.t. $T < |\mathbb{F}|$. Consider lists $R$ and $W$, both initialized as empty. Let $c_{\text{begin}}, t_{\text{begin}} \in \mathbb{F}$ be arbitrary values. The $T$-round game with $A$ proceeds as follows:

1. Append $(c_{\text{begin}}, t_{\text{begin}})$ to $W$.
2. In each round $i \in [T]$: $A$ chooses $c_i, t_i \in \mathbb{F}$. Append $(c_i, t_i)$ to $R$; append $(c_i, t_i + 1)$ to $W$.
3. After T rounds: $A$ chooses $c_{\text{end}}, t_{\text{end}} \in \mathbb{F}$. Append $(c_{\text{end}}, t_{\text{end}})$ to $R$.

We say that $A$ wins i.f.f. $W \sim R$ at the end of the game.

**Lemma 3.** If $A$ wins Game 1, then the following must hold:

$$c_{\text{begin}} = c_1 = \cdots = c_T = c_{\text{end}}$$

Namely, each $c_i \in [T]$ and $c_{\text{end}}$ provided by $A$ are equal to $c_{\text{begin}}$.

Proof. By induction on $T$.

**Base case:** When $T = 0$, $W$ (resp. $R$) only has a single element $(c_{\text{begin}}, t_{\text{begin}})$ (resp. $(c_{\text{end}}, t_{\text{end}})$). If $W \sim R$, then the single element in $R$ must be $(c_{\text{begin}}, t_{\text{begin}})$. This implies $c_{\text{begin}} = c_{\text{end}}$, so the lemma holds.

**Induction:** By the inductive hypothesis, the lemma holds when $T = n$. Consider $T = n + 1$. Suppose $A$ wins the game when $T = n + 1$, and thus $R \sim W$. Consider $(c_{\text{begin}}, t_{\text{begin}})$ in $W$. The value $(c_{\text{begin}}, t_{\text{begin}})$ must be in $R$, and there are two possible placements of this value:
1. $(c_{\text{begin}}, t_{\text{begin}})$ is the last element of $R$:

$$(c_{\text{end}}, t_{\text{end}}) = (c_{\text{begin}}, t_{\text{begin}})$$

If we ignore $(c_{\text{end}}, t_{\text{end}})$ in $R$ and $(c_{\text{begin}}, t_{\text{begin}})$ in $W$, then the remaining $n + 1$ elements in $R$ and $W$ must still form a permutation:

$$\{(c_1, t_1), \ldots, (c_T, t_T)\} \sim \{(c_1, t_1 + 1), \ldots, (c_T, t_T + 1)\}$$

But this is impossible by Lemma 2.

2. $(c_{\text{begin}}, t_{\text{begin}})$ is not the last element of $R$. I.e., there exists some $j \in [n + 1]$ such that:

$$(c_j, t_j) = (c_{\text{begin}}, t_{\text{begin}})$$

In this case, the corresponding tuple appended to $W$ in round $j$ $(c_j, t_j + 1)$ can be understood as a tuple $(c'_{\text{begin}}, t'_{\text{begin}})$ in a new game with $T' = T - 1 = n$. By the inductive hypothesis, we know all $c_i \neq j$ and $c_{\text{end}}$ are equal to $c_j$, and $c_j$ is equal to $c_{\text{begin}}$. Thus, the lemma holds for $T = n + 1$.

Note, the inductive argument only holds when $T < |F|$. 

**Game 2 (ROM).** Let $\mathbb{F}$ be a prime field, and let $n, T \in \mathbb{N}$ s.t. $n, T < |\mathbb{F}|$. Consider lists $R$ and $W$, both initialized as empty. Consider a size-$n$ set of field elements $\{\ell_1, \ldots, \ell_n\}$ called the address set. For each $\ell_k \in [n]$, let $c_{k, \text{begin}}, t_{k, \text{begin}} \in \mathbb{F}$ be arbitrary values. The $T$-round game with $A$ proceeds as follows:

1. For each $k \in [n]$, append $(\ell_k, c_{k, \text{begin}}, t_{k, \text{begin}})$ to $W$.

2. In each round $i \in [T]$, $A$ chooses $c_i, t_i \in \mathbb{F}$ and $r_i \in [n]$. Append $(\ell_{r_i}, c_i, t_i)$ to $R$; append $(\ell_{r_i}, c_i, t_i + 1)$ to $W$.

3. After $T$ rounds: for each $\ell_k \in [n]$, $A$ chooses $c_{k, \text{end}}, t_{k, \text{end}} \in \mathbb{F}$. Append $(\ell_k, c_{k, \text{end}}, t_{k, \text{end}})$ to $R$.

We say that $A$ wins i.f.f. $W \sim R$ at the end of the game.

**Lemma 4.** If $A$ wins Game 2, then the following must hold:

$$\forall k \in [n], \forall (\ell_k, c, t) \in R, c = c_{k, \text{begin}}$$

Namely, once $A$ selects $r_i$ in each round, $c_i$ must be equal to $c_{r_i, \text{begin}}$. Also, $c_{k, \text{end}} = c_{k, \text{begin}}$ for each $k \in [n]$.

**Proof.** By partitioning $W$ and $R$ into address-specific permutations, then applying Lemma 3.

In more detail, recall that in each round $i \in [T]$, $A$ selects $r_i \in [n]$. We denote by $T_k$ the number of rounds where $A$ selects $r_i = k$. For each $k \in [n]$, we define address-specific reads and writes:

$$W_k \triangleq \{(c, t) \mid (\ell_k, c, t) \in W\}$$

$$R_k \triangleq \{(c, t) \mid (\ell_k, c, t) \in R\}$$

Note, $|W_k| = |R_k| = T_k + 1$. Since $W \sim R$, and since each tuple is explicitly tagged with its address $\ell_k$, it must be that $W_k \sim R_k$. For each $k$, we can now apply a $T_k$-round Game 1 to $W_k$ and $R_k$, where $(c_{\text{begin}}, t_{\text{begin}})$ is set to $(c_{k, \text{begin}}, t_{k, \text{begin}})$. Since $T_k \leq T < |\mathbb{F}|$, applying Lemma 3 $n$ times completes the proof. 

30
**Game 3** (Single Element RAM). Let $\mathbb{F}$ be a prime field, and let $T \in \mathbb{N}$ s.t. $T > 1$ and $|\mathbb{F}| \geq 2T$. Consider lists $R$ and $W$, both initialized as empty. Let $c_{\text{begin}} \in \mathbb{F}$ be an arbitrary value. Consider a non-empty size-$m$ set $S \triangleq \{s_1, \ldots, s_m\}$ called the timestamp set where:

$$1 \leq s_1 < s_2 < \cdots < s_m \leq T$$

The $m$-round game with $A$ proceeds as follows:

1. Append $(c_{\text{begin}}, 0)$ to $W$.
2. For each round $i \in [m]$: $A$ chooses $d_i, t_i, c_i \in \mathbb{F}$. Append $(d_i, t_i)$ to $R$; append $(c_i, \text{embed}(s_i))$ to $W$ where $\text{embed}(\cdot)$ denotes the trivial embedding from $[T]$ to $\mathbb{F}$ (henceforth, we omit calls to $\text{embed}$).
3. After $m$ rounds: $A$ chooses $d_{\text{end}}, t_{\text{end}} \in \mathbb{F}$. Append $(d_{\text{end}}, t_{\text{end}})$ to $R$.

We say that $A$ wins i.f.f. at the end of the game:

$$W \sim R \quad \text{and} \quad \forall i \in [m], (s_i - t_i) \in \{1, 2, \ldots, T\}$$

**Lemma 5.** If $A$ wins Game 3, it must be that $W = R$.

*Proof.* In short, we argue that at each round $A$ has exactly one value it can “read” (similar to Invariant 2).

In more detail, note:

$$W = \{(c_{\text{begin}}, 0), (c_1, s_1), \ldots, (c_m, s_m)\}$$

If $W \sim R$, then it must be that the following holds:

$$\{t_1, \ldots, t_m, t_{\text{end}}\} \sim 0, s_1, \ldots, s_m \quad (1)$$

Consider $t_1$. By Equation (1), we know $t_1 \in \{0, s_1, \ldots, s_m\}$, and because $A$ won the game, we know $(s_1 - t_1) \in \{1, \ldots, T\}$. Now, we argue it must be that $t_1 = 0$. Indeed, any other choice $t_1 \in \{s_1, \ldots, s_m\}$ will yield a difference $s_1 - t_1$ that outside the allowed range $\{1, \ldots, T\}$. This is because $s_1 - s_1$ is 0, and all other choices $s_m$ are even greater than $s_1$:

$$s_1 - s_m + |\mathbb{F}| \geq 1 - T + |\mathbb{F}| > T$$

Similarly, consider $t_2$. By Equation (1), we know $t_2 \in \{0, s_1, \ldots, s_m\}$, and because $A$ won the game, we know $(s_2 - t_2) \in \{1, \ldots, T\}$. It must be that $t_2 = s_1$. Indeed:

$$s_2 - s_m + |\mathbb{F}| \geq 1 - T + |\mathbb{F}| > T$$

Thus, $t_2$ must be in the set $\{0, s_1\}$. Moreover, $t_1 = 0$, so to satisfy the permutation requirement, $t_2$ can only be $s_1$.

We can inductively repeat the above analysis for each $t_i$. I.e., $t_i = s_{i-1}$ (assume $s_0 = 0$) for all $i \in [m]$ and $t_{\text{end}} = s_m$.

To sum up, $\{t_1, \ldots, t_m, t_{\text{end}}\} = \{0, s_1, \ldots, s_m\}$, where we interpret these as ordered sets. Note that each $s$ is distinct and non-zero. Since $W \sim R$, it must be that $W = R$. \qed
**Game 4 (RAM).** Let $\mathbb{F}$ be a prime field, and let $T \in \mathbb{N}$ s.t. $T > 1$ and $|\mathbb{F}| > 2T - 1$. Consider lists $R$ and $W$, both initialized as empty. Consider a set of field elements $\{\ell_1, \ldots, \ell_n\}$ called the address set. For each $\ell_k \in [n]$, let $c_k, \text{begin} \in \mathbb{F}$ be an arbitrary field element. The $T$-round game with $A$ proceeds as follows:

1. For each $k \in [n]$, append $(\ell_k, c_k, \text{begin}, 0)$ to $W$.

2. For each round $i \in [T]$: $A$ chooses $d_i, t_i, c_i \in \mathbb{F}$ and $r_i \in [n]$. Append $(\ell_{r_i}, d_i, t_i)$ to $R$; append $(\ell_{r_i}, c_i, i)$ to $W$.

3. After $T$ rounds: for each $\ell_k \in [n]$, $A$ chooses $d_k, \text{end}, t_k, \text{end} \in \mathbb{F}$. Append $(\ell_k, d_k, \text{end}, t_k, \text{end})$ to $R$.

We say that $A$ wins i.f.f. at the end of the game:

$$W \sim R \quad \text{and} \quad \forall i \in [T], (i - t_i) \in \{1, 2, \ldots, T\}$$

**Lemma 6.** If $A$ wins Game 4, then the following must hold: For each $k \in [n]$, define lists $W_k$ and $R_k$ as follows:

$$W_k \triangleq \{(c, t) \mid (\ell_k, c, t) \in W\}$$
$$R_k \triangleq \{(d, t) \mid (\ell_k, d, t) \in R\}$$

I.e., fetch (in order) the tuples in $W$ and $R$ with $\ell_k$ as the address. For each $k$, it holds that $W_k = R_k$.\(^6\)

**Proof.** Immediate, by partitioning $W$ and $R$ into address-specific permutations, then applying Lemma 5 $n$ times. \(\square\)

### C.2 Proof of Lemma 1

We instantiate our protocols as a VOLE-based ZK system. Lemma 1 shows that $F_{\text{perm}}^{\text{ZK}}$ can be UC-realized in the VOLE-hybrid model. This is achieved by prior work, e.g., [YSWW21, FKL+21]. For completeness, we state facts supported by prior work.

**Lemma 7.** There exists a 3-round protocol in the VOLE-hybrid model that UC-realizes $F_{\text{perm}}^{\text{ZK}}$ without permutation gates. For each use of $\text{input}$ and of $\sim$ on two length-$n$ vectors, $P$ and $V$ use 1 VOLE correlation. The protocol achieves soundness error $O(\frac{m}{|\mathbb{F}|})$, where $m$ is the number of multiplications.

**Proof.** See [YSWW21]. \(\square\)

**Lemma 8.** There exists a 3-round protocol in the VOLE-hybrid model that UC-realizes $F_{\text{perm}}^{\text{ZK}}$ with only permutation and input gates. For each use of $\text{input}$, $P$ and $V$ use 1 VOLE correlation; for each use of $\sim$ on two length-$n$ vectors, $P$ and $V$ each evaluate $O(n)$ field operations and consume $\frac{2n}{\epsilon - 1} + o(1)$ VOLE correlations, for any constant $\epsilon \geq 2$. The protocol achieves soundness error $O(\frac{n}{|\mathbb{F}|})$, where $n$ is the size of the largest vector passed to $\sim$.

**Proof.** See [FKL+21]. \(\square\)

\(^6\)This does not imply that $W$ and $R$ will be identical.
Our Lemma 1 simply combines Lemma 7 and Lemma 8. The combined protocol has 5 rounds:

1. $P$ commits the inputs and the outputs of multiplications.
2. $V$ sends a challenge for permutation proofs.
3. $P$ commits the intermediate values of the product circuits in the permutation proofs.
4. $V$ sends a challenge used to batch polynomial proofs.
5. $P$ sends the final proof.

### C.3 Proof of Theorem 1

**Theorem 1.** Protocol 1 UC-realizes $F_{ZK}^{ram}$ (Figure 4) in the $F_{ZK}^{perm}$-hybrid model (Figure 3) with no soundness error.

**Proof.** By construction of a simulator $S$. Since the simulator for malicious $V$ (i.e., the ZK property) is trivial, we focus on the simulator for malicious $P$ (i.e., the soundness property).

$S$ acts as the $F_{ZK}^{perm}$ functionality and collect $P$’s extended witness $w$ intended for circuit $C'$. $S$ checks that $C'(w) = 0$ (if not, it aborts). It then discard the parts of $w$ that are used for input gates internal to our RAM construction, and it forwards the remaining parts of $w$ to the ideal functionality (with permutations, executed in $F_{ZK}^{perm}$).

We argue that the distributions of two worlds seen by the environment $E$ are identical. The challenge is to show that $E$ cannot distinguish the behavior of real-world $V$ from that of ideal-world $V$. To show this, we must argue that real-world $V$ accepts the proof i.f.f. ideal-world $V$ accepts. I.e., we must argue our protocol is sound.

As a first step, note that if $S$ aborts, then in each world $V$ outputs false. Thus, we only need to argue that when $S$ does not abort, ideal-world $V$ outputs true. $V$ in the real world also outputs true because we condition on the fact that $S$ does not abort, i.e., simulated $F_{ZK}^{perm}$ outputs true. This is exactly the soundness loss.

In other words, we must show that $S$ holds a valid witness for the RAM circuit $C$. The crucial argument is: if malicious $P$ passes all checks (performed by $F_{ZK}^{perm}$ and simulated by $S$), $P$ must use a valid witness for $F_{ZK}^{ram}$. This is true because of the following:

1. Malicious $P$ passes checks internal to the ROM (more precisely, the set data structure inside the RAM). I.e., $P$ wins Game 2. From Lemma 4, we know that $P$’s witness must meet the semantics of ROM. Namely, the timing value $t$ introduced by $P$ on each RAM access must belong to $\{1, \ldots, T\}$.

2. Conditioned on Item 1, malicious $P$ passes checks internal to the RAM (including the set membership checks). I.e., $P$ wins Game 4. From Lemma 6, we know that $P$’s witness must also meet the semantics of RAM.

Thus, $V$ in the ideal world will always output true. To sum up, there is no soundness loss. \[\square\]
Appendix D  Additional Optimizations

We describe other optimizations available in our RAM.

**Amortizing sets.** A significant portion of our cost comes from the need to tear down the set \( \{1, \ldots, T\} \). Indeed, set teardown requires \( T \) input gates and \( T \) multiplication gates.

Note that the set \( \{1, \ldots, T\} \) may be useful outside the context of a single RAM. For instance, it may be desirable to instantiate more than one RAM, or the set might be useful elsewhere, e.g. for performing comparisons.

It is possible to leverage a single set in more than one place, reducing the proportion of work spent tearing down sets and saving up to amortized 1 input gate and 1 multiplication per RAM access. We do not incorporate this optimization into our evaluation or cost analysis.

**Periodic reset.** Our RAM allows \( T \) accesses for arbitrary \( T \). After \( T \) accesses, we tear down the data structure, and our formal procedure `teardown` returns the RAM content as a vector of wires. Thus, it is possible to (1) set up a RAM, (2) perform \( T \) accesses, (3) tear down the RAM, (4) use the resulting wires to set up a fresh RAM, and (5) repeat. There are two reasons it may be desirable to periodically reset RAM.

First, periodically resetting decreases \( T \), reducing the size of set needed. By combining this with our set amortization trick, we can use the same set of \( T \) elements in each “generation” of the RAM. If the total number of required RAM accesses is \( T' \gg n \), one can minimize the total number of required gates by choosing to reset RAM every \( T = O(n \cdot \sqrt{T'}/n) \) accesses. This choice of \( T \) minimizes the total gates needed to (1) tear down RAM components and (2) tear down the set.

Second, using permutation proofs (Section 2.2) incurs soundness error that scales linearly with the size of the permuted vectors. Hence, one can periodically reset RAM (and its underlying set) to decrease the size of required permutations, increasing soundness.

To simplify presentation, we did not include periodic resets in our cost accounting or implementation.
Appendix E  Additional Evaluation

Our ROM and set data structure performance. Figure 18 plots per-access runtime of our ROM in microseconds. Figure 19 plots per-access runtime of our set data structure in microseconds. Figure 22 tabulates detailed data.

Microbenchmarks of our set data structure. Figure 20 tabulates communication required by our set data structure. Figure 21 breaks down the per-access runtime of our set data structure when \( n = 2^{20} \).

Detailed data for all experiments. Figure 22 tabulates the detailed data for our protocols including set data structure, ROM and RAM. Figure 23 tabulates the detailed data for [FKL+21]'s protocols including ROM and RAM. Figure 24 tabulates the detailed data for [DdSGOTV22]'s RAM. Our figures, e.g. Figures 12, 13 and 15, are plotted from this data.

Appendix F  Commit-and-Prove ZK

For completeness, Figure 25 lists the standard commit-and-prove ZK functionality, as formalized by [CLOS02].

---

![Figure 19: Per-query runtime of our set data structure (µs).](image)

<table>
<thead>
<tr>
<th>Direction</th>
<th>( \log n )</th>
</tr>
</thead>
<tbody>
<tr>
<td>( P \rightarrow V )</td>
<td>12</td>
</tr>
<tr>
<td>V ( \rightarrow P )</td>
<td>0.4</td>
</tr>
<tr>
<td>Total</td>
<td>10.8</td>
</tr>
</tbody>
</table>

Figure 20: Per-access communication used by our set data structure (bytes).
<table>
<thead>
<tr>
<th>Bandwidth</th>
<th>25 Mbps</th>
<th>100 Mbps</th>
<th>500 Mbps</th>
<th>1 Gbps</th>
</tr>
</thead>
<tbody>
<tr>
<td>Setup</td>
<td>0.21</td>
<td>0.11</td>
<td>0.09</td>
<td>0.10</td>
</tr>
<tr>
<td>Access</td>
<td>2.72</td>
<td>0.75</td>
<td>0.31</td>
<td>0.28</td>
</tr>
<tr>
<td>Teardown</td>
<td>1.17</td>
<td>0.35</td>
<td>0.25</td>
<td>0.27</td>
</tr>
<tr>
<td>Total</td>
<td>4.10</td>
<td>1.21</td>
<td>0.65</td>
<td>0.65</td>
</tr>
</tbody>
</table>

Figure 21: Cost breakdown of our set data structure (µs).

<table>
<thead>
<tr>
<th>log n</th>
</tr>
</thead>
<tbody>
<tr>
<td>11</td>
</tr>
<tr>
<td>12</td>
</tr>
<tr>
<td>13</td>
</tr>
<tr>
<td>14</td>
</tr>
<tr>
<td>15</td>
</tr>
<tr>
<td>16</td>
</tr>
<tr>
<td>17</td>
</tr>
<tr>
<td>18</td>
</tr>
<tr>
<td>19</td>
</tr>
<tr>
<td>20</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>set</th>
<th>Comp.</th>
<th>25 Mbps</th>
<th>100 Mbps</th>
<th>500 Mbps</th>
<th>1 Gbps</th>
</tr>
</thead>
<tbody>
<tr>
<td>11</td>
<td>3.62</td>
<td>1.06</td>
<td>0.55</td>
<td>0.52</td>
<td>0.10</td>
</tr>
<tr>
<td>12</td>
<td>3.62</td>
<td>1.07</td>
<td>0.54</td>
<td>0.51</td>
<td>0.11</td>
</tr>
<tr>
<td>13</td>
<td>3.61</td>
<td>1.05</td>
<td>0.54</td>
<td>0.51</td>
<td>0.09</td>
</tr>
<tr>
<td>14</td>
<td>3.61</td>
<td>1.06</td>
<td>0.54</td>
<td>0.51</td>
<td>0.10</td>
</tr>
<tr>
<td>15</td>
<td>3.60</td>
<td>1.04</td>
<td>0.56</td>
<td>0.51</td>
<td>0.06</td>
</tr>
<tr>
<td>16</td>
<td>3.62</td>
<td>1.06</td>
<td>0.54</td>
<td>0.51</td>
<td>0.06</td>
</tr>
<tr>
<td>17</td>
<td>3.62</td>
<td>1.07</td>
<td>0.54</td>
<td>0.51</td>
<td>0.06</td>
</tr>
<tr>
<td>18</td>
<td>3.76</td>
<td>1.06</td>
<td>0.54</td>
<td>0.51</td>
<td>0.06</td>
</tr>
<tr>
<td>19</td>
<td>4.10</td>
<td>1.06</td>
<td>0.65</td>
<td>0.52</td>
<td>0.05</td>
</tr>
<tr>
<td>20</td>
<td>4.10</td>
<td>1.09</td>
<td>0.65</td>
<td>0.52</td>
<td>0.05</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>ROM</th>
<th>25 Mbps</th>
<th>100 Mbps</th>
<th>500 Mbps</th>
<th>1 Gbps</th>
</tr>
</thead>
<tbody>
<tr>
<td>Comp.</td>
<td>10.35</td>
<td>1.05</td>
<td>0.83</td>
<td>0.73</td>
</tr>
<tr>
<td>Comm.</td>
<td>10.35</td>
<td>1.05</td>
<td>0.83</td>
<td>0.73</td>
</tr>
<tr>
<td>11</td>
<td>3.62</td>
<td>0.83</td>
<td>0.82</td>
<td>0.71</td>
</tr>
<tr>
<td>12</td>
<td>3.62</td>
<td>0.83</td>
<td>0.82</td>
<td>0.71</td>
</tr>
<tr>
<td>13</td>
<td>3.61</td>
<td>0.82</td>
<td>0.82</td>
<td>0.71</td>
</tr>
<tr>
<td>14</td>
<td>3.61</td>
<td>0.82</td>
<td>0.82</td>
<td>0.71</td>
</tr>
<tr>
<td>15</td>
<td>3.60</td>
<td>0.81</td>
<td>0.80</td>
<td>0.71</td>
</tr>
<tr>
<td>16</td>
<td>3.62</td>
<td>0.81</td>
<td>0.80</td>
<td>0.71</td>
</tr>
<tr>
<td>17</td>
<td>3.62</td>
<td>1.84</td>
<td>0.80</td>
<td>0.71</td>
</tr>
<tr>
<td>18</td>
<td>3.62</td>
<td>1.84</td>
<td>0.80</td>
<td>0.71</td>
</tr>
<tr>
<td>19</td>
<td>3.62</td>
<td>1.84</td>
<td>0.80</td>
<td>0.71</td>
</tr>
<tr>
<td>20</td>
<td>3.62</td>
<td>1.84</td>
<td>0.80</td>
<td>0.71</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>RAM</th>
<th>25 Mbps</th>
<th>100 Mbps</th>
<th>500 Mbps</th>
<th>1 Gbps</th>
</tr>
</thead>
<tbody>
<tr>
<td>Comp.</td>
<td>15.55</td>
<td>1.87</td>
<td>0.83</td>
<td>0.71</td>
</tr>
<tr>
<td>Comm.</td>
<td>15.55</td>
<td>1.87</td>
<td>0.83</td>
<td>0.71</td>
</tr>
<tr>
<td>11</td>
<td>6.36</td>
<td>1.83</td>
<td>0.82</td>
<td>0.71</td>
</tr>
<tr>
<td>12</td>
<td>6.36</td>
<td>1.83</td>
<td>0.82</td>
<td>0.71</td>
</tr>
<tr>
<td>13</td>
<td>6.36</td>
<td>1.83</td>
<td>0.82</td>
<td>0.71</td>
</tr>
<tr>
<td>14</td>
<td>6.36</td>
<td>1.83</td>
<td>0.82</td>
<td>0.71</td>
</tr>
<tr>
<td>15</td>
<td>6.37</td>
<td>1.84</td>
<td>0.83</td>
<td>0.71</td>
</tr>
<tr>
<td>16</td>
<td>6.37</td>
<td>1.84</td>
<td>0.83</td>
<td>0.71</td>
</tr>
<tr>
<td>17</td>
<td>6.43</td>
<td>1.85</td>
<td>0.85</td>
<td>0.73</td>
</tr>
<tr>
<td>18</td>
<td>6.43</td>
<td>1.85</td>
<td>0.85</td>
<td>0.73</td>
</tr>
<tr>
<td>19</td>
<td>6.56</td>
<td>1.86</td>
<td>0.85</td>
<td>0.73</td>
</tr>
<tr>
<td>20</td>
<td>6.56</td>
<td>1.86</td>
<td>0.85</td>
<td>0.73</td>
</tr>
</tbody>
</table>

Figure 22: Detailed per-access runtime (µs) and per-access communication (bytes) of our set/ROM/RAM.

<table>
<thead>
<tr>
<th>log n</th>
</tr>
</thead>
<tbody>
<tr>
<td>11</td>
</tr>
<tr>
<td>12</td>
</tr>
<tr>
<td>13</td>
</tr>
<tr>
<td>14</td>
</tr>
<tr>
<td>15</td>
</tr>
<tr>
<td>16</td>
</tr>
<tr>
<td>17</td>
</tr>
<tr>
<td>18</td>
</tr>
<tr>
<td>19</td>
</tr>
<tr>
<td>20</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>ROM</th>
<th>Comp.</th>
<th>25 Mbps</th>
<th>100 Mbps</th>
<th>500 Mbps</th>
<th>1 Gbps</th>
</tr>
</thead>
<tbody>
<tr>
<td>11</td>
<td>23.06</td>
<td>0.71</td>
<td>1.63</td>
<td>1.35</td>
<td></td>
</tr>
<tr>
<td>12</td>
<td>23.51</td>
<td>0.71</td>
<td>1.63</td>
<td>1.35</td>
<td></td>
</tr>
<tr>
<td>13</td>
<td>24.14</td>
<td>0.71</td>
<td>1.63</td>
<td>1.35</td>
<td></td>
</tr>
<tr>
<td>14</td>
<td>25.14</td>
<td>0.71</td>
<td>1.63</td>
<td>1.35</td>
<td></td>
</tr>
<tr>
<td>15</td>
<td>25.65</td>
<td>0.71</td>
<td>1.63</td>
<td>1.35</td>
<td></td>
</tr>
<tr>
<td>16</td>
<td>26.64</td>
<td>0.71</td>
<td>1.63</td>
<td>1.35</td>
<td></td>
</tr>
<tr>
<td>17</td>
<td>27.16</td>
<td>0.71</td>
<td>1.63</td>
<td>1.35</td>
<td></td>
</tr>
<tr>
<td>18</td>
<td>28.04</td>
<td>0.71</td>
<td>1.63</td>
<td>1.35</td>
<td></td>
</tr>
<tr>
<td>19</td>
<td>29.26</td>
<td>0.71</td>
<td>1.63</td>
<td>1.35</td>
<td></td>
</tr>
<tr>
<td>20</td>
<td>31.48</td>
<td>0.71</td>
<td>1.63</td>
<td>1.35</td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>RAM</th>
<th>Comp.</th>
<th>25 Mbps</th>
<th>100 Mbps</th>
<th>500 Mbps</th>
<th>1 Gbps</th>
</tr>
</thead>
<tbody>
<tr>
<td>11</td>
<td>28.34</td>
<td>1.43</td>
<td>12.63</td>
<td>32.00</td>
<td></td>
</tr>
<tr>
<td>12</td>
<td>28.47</td>
<td>1.43</td>
<td>12.63</td>
<td>32.00</td>
<td></td>
</tr>
<tr>
<td>13</td>
<td>28.86</td>
<td>1.43</td>
<td>12.63</td>
<td>32.00</td>
<td></td>
</tr>
<tr>
<td>14</td>
<td>29.23</td>
<td>1.43</td>
<td>12.63</td>
<td>32.00</td>
<td></td>
</tr>
<tr>
<td>15</td>
<td>29.68</td>
<td>1.43</td>
<td>12.63</td>
<td>32.00</td>
<td></td>
</tr>
<tr>
<td>16</td>
<td>30.18</td>
<td>1.43</td>
<td>12.63</td>
<td>32.00</td>
<td></td>
</tr>
<tr>
<td>17</td>
<td>30.79</td>
<td>1.43</td>
<td>12.63</td>
<td>32.00</td>
<td></td>
</tr>
<tr>
<td>18</td>
<td>31.39</td>
<td>1.43</td>
<td>12.63</td>
<td>32.00</td>
<td></td>
</tr>
<tr>
<td>19</td>
<td>32.79</td>
<td>1.43</td>
<td>12.63</td>
<td>32.00</td>
<td></td>
</tr>
<tr>
<td>20</td>
<td>34.78</td>
<td>1.43</td>
<td>12.63</td>
<td>32.00</td>
<td></td>
</tr>
</tbody>
</table>

Figure 23: Detailed per-access runtime (µs) and per-access communication (bytes) of [FKL+21]'s ROM/RAM.
Figure 24: Detailed per-access runtime (µs) and per-access communication (bytes) of [DdSGOTV22]'s RAM.

<table>
<thead>
<tr>
<th>RAM</th>
<th>25 Mbps</th>
<th>100 Mbps</th>
<th>500 Mbps</th>
<th>1 Gbps</th>
</tr>
</thead>
<tbody>
<tr>
<td>Comp.</td>
<td>25.92</td>
<td>11.86</td>
<td>4.23</td>
<td>3.80</td>
</tr>
<tr>
<td></td>
<td>25.94</td>
<td>11.89</td>
<td>4.24</td>
<td>3.80</td>
</tr>
<tr>
<td></td>
<td>25.96</td>
<td>11.81</td>
<td>4.25</td>
<td>3.80</td>
</tr>
<tr>
<td></td>
<td>25.98</td>
<td>11.82</td>
<td>4.26</td>
<td>3.80</td>
</tr>
<tr>
<td></td>
<td>26.00</td>
<td>11.83</td>
<td>4.27</td>
<td>3.80</td>
</tr>
</tbody>
</table>

| Comm.   | P → V   | 126.64   | 126.67   | 126.73  |
|         | V → P   | 3.06     | 3.06     | 3.06    |

<table>
<thead>
<tr>
<th>log n</th>
<th>11</th>
<th>12</th>
<th>13</th>
<th>14</th>
<th>15</th>
<th>16</th>
<th>17</th>
<th>18</th>
<th>19</th>
<th>20</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>42.78</td>
<td>42.80</td>
<td>42.74</td>
<td>42.74</td>
<td>42.78</td>
<td>42.92</td>
<td>43.26</td>
<td>43.90</td>
<td>45.28</td>
<td>47.97</td>
</tr>
<tr>
<td></td>
<td>11.86</td>
<td>11.86</td>
<td>11.89</td>
<td>11.81</td>
<td>11.80</td>
<td>12.01</td>
<td>12.13</td>
<td>12.53</td>
<td>13.30</td>
<td></td>
</tr>
<tr>
<td></td>
<td>4.23</td>
<td>4.29</td>
<td>4.15</td>
<td>4.18</td>
<td>4.24</td>
<td>4.20</td>
<td>4.25</td>
<td>4.31</td>
<td>4.43</td>
<td>4.73</td>
</tr>
<tr>
<td></td>
<td>3.80</td>
<td>3.80</td>
<td>3.72</td>
<td>3.72</td>
<td>3.75</td>
<td>3.76</td>
<td>3.78</td>
<td>3.88</td>
<td>4.02</td>
<td>4.25</td>
</tr>
</tbody>
</table>

Figure 25: The commit-and-prove ZK functionality.

**Functionality $F_{ZK}^{cp}$**

$F_{ZK}^{cp}$ considers a prover $P$, a verifier $V$, and an adversary $S$, and is parameterized by a field $F$:

**Commits.** Upon receiving $(\text{commit}, cid, x)$ from $P$ where (a) there is no recorded tuple $(cid, \cdot)$, and (b) $x \in F$: Record the tuple $(cid, x)$ and send $(\text{commit}, cid)$ to $S$ and $V$.

**Proof.** Upon receiving $(\text{prove}, C, cid_1, \ldots, cid_n)$ from $P$ where (1) $C$ is a circuit $C : F^n \rightarrow F$, and (b) each $cid_{i\in[n]}$ was recorded:

- Fetch recorded $(cid_1, x_1), \ldots, (cid_n, x_n)$, compute $y := C(x_1, \ldots, x_n)$. If $y = 0$, send $(\text{prove}, C, cid, \text{true})$ to $S$ and $V$; else send $(\text{prove}, C, cid, \text{false})$ to $S$ and $V$. 