# A fast heuristic for mapping Boolean circuits to functional bootstrapping

Sergiu Carpov

Inpher

Abstract. Functional bootstrapping in FHE schemes such as FHEW and TFHE allows the evaluation of a function on an encrypted message, in addition to noise reduction. Implementing programs that directly use functional bootstrapping is challenging and error-prone. In this paper, we propose a heuristic that automatically maps Boolean circuits to functional bootstrapping instructions. Unlike other approaches, our method does not limit the encrypted data plaintext space to a power-of-two size, allowing the instantiation of functional bootstrapping with smaller parameters. Furthermore, the negacyclic property of functional bootstrapping is exploited to extend the plaintext space. Despite the inherently greedy nature of the heuristic, experimental results show that the mapped circuits exhibit a significant reduction in evaluation time. Furthermore, our heuristic was able to achieve a 45% improvement in evaluation time when applied to manually implemented Trivium and Kreyvium circuits.

**Keywords:** Functional bootstrapping · Boolean circuit mapping · Fully Homomorphic Encryption.

# 1 Introduction

Fully homomorphic encryption (FHE) is an encryption scheme that enables the direct execution of arbitrary computations on encrypted data. The first FHE scheme was introduced by Gentry in his seminal work [17]. The construction relies on a technique called *bootstrapping*, which is used to reduce noise in FHE ciphertexts. This construction theoretically enables the execution of any computation directly over encrypted data but remains slow in practice. Many works [16,5,15,10,11] have built upon Gentry's initial proposal and have contributed to further improvements in the efficiency of FHE.

FHE schemes are typically classified into two main categories. The first category of FHE schemes is based on Gentry's initial proposal. While the bootstrapping procedure is relatively time-consuming, it enables the efficient packing of data through the use of batching techniques. Typically ciphertexts are bootstrapped as rarely as possible following the evaluation of numerous homomorphic operations. The second type of FHE schemes is based on the **GSW** somewhat homomorphic scheme, which was proposed in 2013 by Gentry [18] and supports branching programs with polynomial noise overhead. These schemes are referred to as *fast bootstrapping* schemes. One limitation of these schemes is that they

can only bootstrap one message at a time, but the bootstrapping procedure is relatively fast. One of the key benefits of these schemes is that they can be used to compute an arbitrary function while simultaneously reducing noise. We refer to it as *functional bootstrapping* (FBS). The FHEW scheme [15] introduced a FBS procedure which evaluates a **nand** gate in addition to noise reduction, and suggested an extension for other/larger gates. In a subsequent work [4] the FHEW scheme was adapted to accommodate arbitrary multi-input Boolean gates. The authors of [10,11] further enhanced these designs and introduced the TFHE library [12]. TFHE's bootstrapping implementation can execute any two-input Boolean gate in approximately 10 milliseconds. In [9], the authors propose a bootstrapping method for evaluating several functions on the same inputs at once.

In order to evaluate functions with several inputs, it is necessary to linearly combine them into a single value beforehand. This method is referred to as multiinput FBS throughout the remainder of this document. Linear combinations are fast and are implemented through the use of scalar ciphertext multiplications and ciphertext-ciphertext additions. The binary composition function  $\sum_i x_i \cdot 2^i$  is used for evaluating any Boolean functions with *n* inputs. However, this approach has a significant drawback: the required plaintext space size is exponential in the number of function inputs. To address this limitation, we can use linear combinations with smaller plaintext space sizes for specific Boolean functions. One such example is *symmetric* Boolean functions, where the output depends only on the number of activated inputs and not on their position.

Motivation. Boolean circuits are evaluated in a gate-by-gate manner using fast bootstrapping schemes. Logic synthesis tools can be used to map circuits to a library of Boolean gates that are supported by a particular FHE scheme. As an example, in the case of TFHE, the library contains the complete set of 2-input gates. Another option is to map the Boolean circuit to lookup tables (LUT) and evaluate them homomorphically as generic *n*-input functions. Both of these solutions are straightforward to implement because Boolean circuit mapping is a well-studied problem in the field of logic synthesis, and there are plenty of performant tools available [3,23].

It is not the most efficient approach to limit the FHE evaluation to generic n-input gates. As previously discussed, symmetric Boolean gates require smaller FHE parameters, resulting in faster processing times. A fast bootstrapping FHE scheme parameterized for generic n-input gates can be used to evaluate any symmetric Boolean function with up to  $2^n - 1$  inputs. Additionally, it should be noted that FBS with power-of-two plaintext space is not always necessary.

As an example, the full-adder is a logic circuit which computes the sum of three input bits and outputs it on two bits. The minimal number of 2-input gates (or FBS with plaintext  $\mathbb{Z}_4$ ) required for this circuit is 5. However, when mapping this circuit to 3-input LUTs (or FBS with plaintext  $\mathbb{Z}_8$ ) only 2 are needed, or as low as 1 LUT if multi-output technique from [9] is used. Moreover, this circuit can be implemented with 2 FBS with a plaintext space of  $\mathbb{Z}_4$  (equivalent to generic 2-input gates) because full-adder outputs are symmetric functions.

Our contribution. We propose a new heuristic method which automatically maps Boolean circuits to functional bootstrapping. The algorithm accepts as input a circuit comprising 2-input gates and a maximal FBS plaintext space. The nodes of the circuit are merged together into larger nodes as long as they can be evaluated using the given FBS procedure parameters. Nodes are visited only once in the order in which they appear. The algorithm employs a greedy strategy which tries to merge nodes for as long as possible, with the aim of minimising the total number of bootstrappings. The size of the FBS plaintext is not limited to power-of-two values. Furthermore, we consider the property of TFHE FBS implementation, which allows the plaintext size for *negacyclic* functions to be increased by up to  $2\times$  without any overhead.

We have implemented a proof of concept for this heuristic, which is publicly available. The heuristic has been tested on a number of circuit benchmark suites, including the EPFL combinational benchmarks, ISCAS 85 and ISCAS 89. It has also been applied to Boolean circuits implemented by hand, namely the Trivium/Kreyvium cipher. The estimated evaluation time of the mapped circuits has been compared to the performance of non-mapped versions, specifically the original circuit evaluated with generic 2-input gates. Our heuristic consistently identifies circuits that are more efficient than their non-mapped counterparts. In comparison to manual Kreyvium FBS implementations from [2], our heuristic identified a Kreyvium implementation that utilises 20% fewer FBSs and has a 45% lower evaluation cost.

Existing work. The AutoHoG method from [20] presents an automated approach for mapping Boolean circuits to FBS. A procedure for optimising multi-input FBS linear combination coefficients is proposed. The objective is to maximise the number of inputs in each FBS, which should subsequently reduce execution time. The authors utilise TFHE FBS with  $Z_{32}$  in the benchmarks and do not consider other plaintext spaces. Another distinction from our work is that Auto-HoG is more resource-intensive as it attempts to optimise the linear combination coefficients for multi-input sub-circuits. In comparison, this work restricts the search to linear combinations with two coefficients.

Helm [19] is a framework for circuit synthesis, mapping and execution targeting TFHE gate or functional bootstrapping. The authors of [22] introduce a circuit mapping to LUTs and consider the special case of full-adders being a symmetric Boolean function. Furthermore, a post-synthesis step is employed which groups several LUTs into one multi-output LUT, leveraging the results from [9]. These works follow the standard approach to logic synthesis tools and do not take into account the specific characteristics of FBS.

Another line of research introduces gate libraries with either multi-input gates [21] or uses alternative plaintext space sizes as [13]. In their work, the authors of [21] show how to evaluate 3-input gates using an extended plaintext space FBS. The Chocobo paper [13] generalises binary logic gates to base-B gates which are computed as an FBS. A variety of approaches to computing two-input B-gates are presented. The chaining method corresponds to the multi-input FBS

with plaintext space  $\mathbb{Z}_{B^2}$ . However the authors do not consider specific *B*-gates which require a much smaller plaintext space.

The authors of [2,24] present hand-optimised algorithms employing FBS. They demonstrate how to implement Trivium, Kreyvium and AES, showcasing various optimisation techniques (negacyclic functions, larger than 2 plaintext spaces, etc.). However, they do not consider non-power-of-two plaintext spaces.

*Paper organization.* We begin with a comprehensive overview of functional bootstrapping in Section 2, followed by the proposed mapping heuristic in Section 3 and with experimental results in Section 4.

# 2 Multi-input functional bootstrapping

**Notations** A vector of size n is denoted by  $\boldsymbol{v}, \boldsymbol{v} = \{v_0, v_1, \dots, v_{n-1}\}$ , and the *i*-th vector element is  $v_i$ .

### 2.1 Functional bootstrapping

TFHE [10,11] is a fully homomorphic encryption scheme with a fast bootstrapping procedure. The paper describes, the use of functional bootstrapping to evaluate 2-input logic gate circuits, which is denoted *gate bootstrapping*. The bootstrapping procedure is implemented via a homomorphic accumulator which evaluates the linear part of the decryption function, followed by the non-linear part. For this line of schemes, the structure of the bootstrapping can be divided in 4 steps:

- 1. The coefficients of an input LWE ciphertext  $\boldsymbol{c} = (\boldsymbol{a}, b)$  are mapped to  $\mathbb{Z}_{2N}$ . A cyclic multiplicative group  $\mathcal{G}$ , where  $\mathbb{Z}_{2N} \simeq \mathcal{G}$ , is used for an equivalent representation of  $\mathbb{Z}_{2N}$  elements.  $\mathcal{G}$  contains all the powers  $X^k \mod X^N + 1$ , where  $X^N + 1$  is the quotient polynomial defining the input RLWE scheme.
- 2. The message phase encrypted in the input ciphertext c is transformed to a RLWE encryption of  $X^{\varphi}$ . The encryption  $X^{\varphi}$  is obtained by computing the linear transformation  $b - \mathbf{a} \cdot \mathbf{s} (\approx \varphi)$  using GSW encryptions of  $X^{s_i}$  (i.e. bootstrapping key). We obtain the so-called accumulator ACC which contains an encryption of  $X^{\varphi} \in \mathcal{G}$ . This is the *linear step* of the LWE decryption algorithm.
- 3. The accumulator ACC is multiplied with a test polynomial (or test vector)  $\mathsf{TV}_F$ . The test polynomial encodes the output values of a function G for each possible input message phase  $\varphi \in \mathbb{Z}_{2N}$ , where G is a function from  $\mathbb{Z}_{2N}$  to  $\mathbb{Z}_p$ . Function G is a composition of the "payload" function  $F : \mathbb{Z}_p \to \mathbb{Z}_p$  and a rounding function  $R_p : \mathbb{Z}_{2N} \to \mathbb{Z}_p$ . The rounding function is needed because phase  $\varphi$  is a noised version of the actual message  $m = R_p(\varphi)$  encrypted in  $\boldsymbol{c} = (\boldsymbol{a}, \boldsymbol{b})$  The rounding function corresponds to the final *non-linear step* of ciphertext  $\boldsymbol{c}$  decryption.
- 4. Finally, an LWE encryption of F(m) (or  $G(\varphi)$ ) is extracted from RLWE encryption  $\mathsf{TV}_F \cdot X^{\varphi}$ .

The following sections consider FBS as a method for evaluating generic functions  $F : \mathbb{Z}_p \to \mathbb{Z}_p$ . The input to this method is an LWE encryption of  $m \in \mathbb{Z}_p$ , and the output is also an LWE encryption of the function F applied on m. In the context of cyclotomic rings with modulus  $X^N + 1$ , the functional bootstrapping can be extended to negacyclic functions  $F : \mathbb{Z}_{2p} \to \mathbb{Z}_p$ , which verify the equality F(x) = -F(x+p) for all  $x \in \mathbb{Z}_p$ .

#### 2.2 Multi-input functional bootstrapping

The FBS procedure is designed to handle a single encrypted message as input. To extend its capabilities to multiple input ciphertexts, the ciphertexts are linearly combined into a single ciphertext, after which the aforementioned bootstrapping procedure is performed. This bootstrapping procedure is referred to as *multi-input functional bootstrapping* due to its ability to evaluate generic multi-input functions.

The first step of multi-input FBS is a linear combination of inputs (LWE ciphertexts) with integer coefficients. The second step is a FBS procedure with a specially crafted test polynomial. In the context of multi-input Boolean function evaluation, the linear combination maps input Boolean values to an integer value, which is subsequently mapped back to a Boolean value by the bootstrapping procedure.

Let  $\mathbb{Z}_p$  be the LWE message space supported by FBS. Hereafter we ignore the fact that in the case of TFHE, messages are values on the torus and instead consider them scaled to  $\mathbb{Z}_p$ . Let f be an *n*-input Boolean function to be evaluated over Boolean values encrypted as LWE ciphertexts. Boolean values are encoded in LWE ciphertexts as either 0s or 1s. Let us define the function  $\phi$  as a linear combination function. We can describe FBS as the composition of a linear function  $\phi$  followed by a non-linear mapping  $F : \mathbb{Z}_p \to \mathbb{Z}_2$ , such that:

$$f = F \circ \phi.$$

The non-linear function F is embedded in the test vector, which maps each value of the image of the function  $\phi$  to a Boolean value. The following sections will provide a more detailed overview of these two steps.

**Linear combination** The function  $\phi_{\boldsymbol{c}}(\boldsymbol{x})$  represents a linear combination, expressed as  $\sum_{i=1}^{n} c_i \cdot x_i$ , where the coefficients  $\boldsymbol{c}$  are integers. A linear combination  $\phi_{\boldsymbol{c}}$  can be used in a multi-input FBS to evaluate a logic function, f, if it is capable to distinguish the output values of f. In more formal terms, for any  $\boldsymbol{x}$  and  $\boldsymbol{x}'$  such that  $f(\boldsymbol{x}) \neq f(\boldsymbol{x}')$  the linear combination must satisfy  $\phi_{\boldsymbol{c}}(\boldsymbol{x}) \neq \phi_{\boldsymbol{c}}(\boldsymbol{x}')$ .

To illustrate, the binary composition function, represented by  $\sum_i x_i \cdot 2^i$ , is a bijective linear combination that can be used to evaluate any Boolean function f. The mapping function is given by F(z) = f(x) where  $z = \sum_i x_i \cdot 2^i$ . However, this approach has the disadvantage that the required FBS precision is exponential in the number of inputs, n. In this case, the FBS plaintext space is  $\mathbb{Z}_{2^n}$ .

Not all Boolean functions require the linear combination to be bijective. For example, the *n*-input majority function  $\operatorname{MAJ}_n(\boldsymbol{x})$  (which outputs true if the majority of inputs are active) can be computed with the linear combination  $\sum_i x_i$  and the mapping function  $F(x) = x \ge n/2$ . This linear combination is surjective and can only be used to evaluate a subclass of Boolean functions, in particular symmetric functions. In comparison to the generic binary composition function, the multi-input FBS which employs the function  $\sum_i x_i$ , requires a significantly smaller precision, which is linear in the number of inputs n.

Let us denote by imsize (c) the *image size* of the linear combination  $\phi_c$ . This is defined as follows:

imsize 
$$(\boldsymbol{c}) = \max_{\boldsymbol{x}} \phi_{\boldsymbol{c}}(\boldsymbol{x}) - \min_{\boldsymbol{x}} \phi_{\boldsymbol{c}}(\boldsymbol{x}) + 1$$

For the sake of simplicity, we assume that  $\phi_{\mathbf{c}}(\mathbf{x}) \geq 0$ . It is possible to transform any linear combination into an equivalent one that verifies the aforementioned relation by subtracting  $\min_{\mathbf{x}} \phi_{\mathbf{c}}(\mathbf{x})$ .

Let us suppose that imsize  $(c) \leq p$ . In this case function  $f = F \circ \phi_c$  can be evaluated by a multi-input FBS with message space  $\mathbb{Z}_p$ . The noise of the input LWE ciphertexts is inversely proportional to the Euclidean norm  $||c||_2$  and it should be chosen in such a way that the error amplitude of the linear combination over the LWE ciphertexts is smaller than 1/2. This implies that the output noise of the linear combination should remain within one message space segment with overwhelming probability.

In conclusion, the precision of FBS depends on two factors: the plaintext space size, p, and the maximal Euclidean norm, l, of supported linear combinations. A FBS parameterised with p and l can be used to evaluate any other Boolean function (regardless of input count) whose linear combination,  $\phi_c$ , verifies imsize  $(c) \leq p$  and  $||c||_2 \leq l$ .

Blind rotation The output of the linear combination evaluation over the encrypted values  $\boldsymbol{x}$  is a LWE encryption of  $\phi_{\boldsymbol{c}}(\boldsymbol{x})$ . Each value of linear combination,  $\phi_{\boldsymbol{c}}$ , image is mapped to the corresponding value of the Boolean function f by the FBS procedure using a specific test vector TV. TV maps the integer value  $\phi_{\boldsymbol{c}}(\boldsymbol{x})$  to the Boolean value  $f(\boldsymbol{x})$  for every  $\boldsymbol{x}$ . The TV is defined as follows:

$$\mathsf{TV} = \sum_{0 \le k < K} F(k) \sum_{\left\lfloor \frac{k \cdot N}{p} \right\rceil \le i < \left\lfloor \frac{(k+1) \cdot N}{p} \right\rceil} X^i \mod X^N + 1.$$

In addition to the function F, this test vector encodes the rounding to  $Z_p$  function of the LWE decryption.

Figure 1 illustrates the message space partition for p = 5 and the function F encoded in the TV. For illustration purposes, a small RLWE ring size (N = 32) was selected. It should be noted that the encoding of the function and message space do not match exactly, as the test vector is discretised to  $2 \cdot N$  values and message space elements are not. It is possible to extend the message space to  $\mathbb{Z}_{2p}$  without incurring any additional cost for *negacyclic* functions F. Refer to the lower half of Figure 1 function encoding.



Fig. 1. Example of FBS function encoding (colored segments) and message space (dashed lines separators).

Negacyclic function evaluation A FBS parameterised for message space  $\mathbb{Z}_p$  can be employed for negacyclic functions over  $\mathbb{Z}_{2p}$ . A function F is negacyclic if F(x) = -F(x+p) for any  $x \in \mathbb{Z}_p$ .

Let us consider a Boolean function,  $f = F \circ \phi_c$ , where the image size of  $\phi_c$  is larger than FBS parameter p, i.e. imsize (c) > p. In this context, three distinct types of negacyclic Boolean functions exist:

- 1.  $F(x) = \neg F(x+p)$  first and last function values are negated,
- 2. F(x) = F(x+p) = 1 first and last function values are ones,
- 3. F(x) = F(x+p) = 0 first and last function values are zeros.

These functions are evaluated as a FBS of function  $F'(x) = F(x) - \mu$  for  $x \in \mathbb{Z}_p$  followed by an addition of a constant  $\mu$ . Here  $\mu$  equals to 1/2, 1 and 0 for each negacyclic function type respectively. It is straightforward to see that function F' is negacyclic. Furthermore, it can be shown that after the constant  $\mu$  has been added, the original function F is restored.

## 3 Mapping Boolean circuits to functional bootstrapping

A Boolean circuit is a directed acyclic graph, denoted by G = (V, E), where V represents the set of nodes and E is the set of directed edges. A vertex  $v \in V$  can be either a circuit input, a circuit output, or a logic gate. An edge,  $(w, v) \in E$ , is a directed connection from a source node w to a destination node v. The function pred(v) returns the predecessors of a given node v, and is defined as  $pred(v) = \{u \mid (u, v) \in E\}$ . A gate node is associated with a Boolean

7

function,  $f_v(U)$ , where U = pred(v) represents the set of predecessors of v. A cone, designated as  $C_v$ , is defined as a sub-set of the node v ancestors, including the node itself, such that for any  $w \in C_v$  every path from w to v must lie entirely within  $C_v$ . The support of a cone, denoted by  $sup(C_v)$ , is a set of nodes that feed into the nodes in the cone but do not belong to it. Formally, this is expressed as  $sup(C_v) = \{u \mid (u, w) \in E, w \in C_v\}$ . The Boolean function  $f_{C_v}$  with inputs  $sup(C_v)$  is defined as the logic function of cone  $C_v$ .

The objective of this work is to partition a Boolean circuit into a set of sub-circuits such that each sub-circuit can be executed by a single functional bootstrapping. We call this problem *Boolean circuit mapping to functional boot*strapping. Let p be the number of plaintext space divisions and l the linear combination Euclidean norm for which FBS has been parameterised. A solution to this problem is a set B of circuit nodes to bootstrap where for each node  $v \in B$  we have a cone  $C_v$  and a vector of integer coefficients  $\mathbf{c}_v$  (a coefficient per node in  $sup(C_v)$ ) such that:

- -B contains all circuit outputs,
- any circuit node belongs to at least one cone  $\bigcup_{v \in B} C_v = V$ ,
- linear mapping  $\phi_{\boldsymbol{c}_{v}}$  is valid for cone logic function  $f_{C_{v}}$  for any  $\boldsymbol{x}, \boldsymbol{x}' \in \mathbb{Z}_{2}^{|\boldsymbol{c}_{v}|}$ such that  $f_{C_{v}}(\boldsymbol{x}) \neq f_{C_{v}}(\boldsymbol{x}')$  we have  $\phi_{\boldsymbol{c}_{v}}(\boldsymbol{x}) \neq \phi_{\boldsymbol{c}_{v}}(\boldsymbol{x}')$ ,
- FBS parameters are valid, i.e. imsize  $(\boldsymbol{c}_v) \leq p$  and  $\|\boldsymbol{c}_v\|_2 \leq l$ .

Given a Boolean circuit the optimization problem is to find a mapping which minimizes circuit evaluation time. The evaluation time of a circuit depends on the input FBS parameters and on the number of bootstrappings in the mapped circuit.

#### 3.1 Heuristic mapping

The following section introduces a heuristic that maps Boolean circuits to FBS, where the parameters of the latter are fixed. The heuristic is outlined in detail in Algorithm 1. The algorithm takes as inputs a Boolean circuit G, a function FIND\_PARAMS which returns a valid linear combination for a node v and functional bootstrapping parameters p and l. The functional bootstrapping parameters support at least any 2-input Boolean gate. We introduce several implementations for function FIND\_PARAMS, with further details provided subsequently. The algorithm output is a solution to the Boolean circuit mapping to FBS problem, as previously defined. The heuristic traverses circuit nodes in topological order, incrementally attempting to merge existing cones or bootstrap nodes in cases where merges do not satisfy the functional bootstrapping parameters.

Each input node of the circuit belongs to an empty cone (line 3). The logic function of an empty cone is the identity function,  $f_{\{\}}(x) = x$ . The image size of the corresponding coefficient vector is imsize ([1]) = 2. The circuit output nodes are added to the set B of nodes to bootstrap. For each gate node v, function FIND\_PARAMS returns a valid linear combination for the merged cone  $\{v\} \cup C_u \cup C_w$  where u, w are the predecessors of v (line 8). In case a linear combination, which verifies FBS parameters p and l, is found (function IS\_VALID\_SIZE) the linear combination coefficients  $c_v$  and cone  $C_v$  for node v are added to solution. Otherwise, the algorithm bootstraps predecessor node with largest linear combination image size (line 10) and resets its cone  $C_u$  to single node (line 11). The process is repeated, i.e. the second predecessor w is bootstrapped, in the event that no valid linear combination is identified (line 12). After the third FIND\_PARAMS call (line 16), both the predecessors of node v are bootstrapped, and the new linear combination is certainly valid.

Algorithm 1 Generic mapping algorithm

**Input:** Boolean circuit G = (V, E) with 2-input gates Input: FIND\_PARAMS $(f_{C_u}, c_u, f_{C_w}, c_w, f_v)$  - a function that returns a valid linear combination for cone  $v \cup C_u \cup C_w$ . **Input:** p, l - FBS message space size and maximal norm of linear combination **Output:** *B* - A set of gates to bootstrap **Output:**  $C_v$  - A cone for  $v \in B$ **Output:**  $c_v$  - A vector of coefficients for  $v \in B$ 1: for all node  $v \in V$  in topological order do 2: if v is input then 3:  $C_v, \boldsymbol{c}_v \leftarrow \{\}, [1]$ else if v is output then 4: 5: $B \leftarrow B \cup \{v\}$ 6: else 7:  $u, w \leftarrow \operatorname{pred}(v)$  such that imsize  $(c_u) \geq \operatorname{imsize}(c_w)$ 8:  $c_v \leftarrow \text{FIND} \quad \text{PARAMS}(f_{C_u}, c_u, f_{C_w}, c_w, f_v)$ 9: if not IS VALID SIZE $(c_v)$  then  $B \leftarrow B \cup \{u\}$ 10:11:  $C_u, \boldsymbol{c}_u \leftarrow \{\}, [1]$ 12: $\boldsymbol{c}_v \leftarrow \text{FIND} \quad \text{PARAMS}(f_{C_u}, \boldsymbol{c}_u, f_{C_w}, \boldsymbol{c}_w, f_v)$ if not is valid Size $(c_v)$  then 13: $B \leftarrow B \cup \{w\}$ 14:15: $C_w, c_w \leftarrow \{\}, [1]$  $\boldsymbol{c}_v \leftarrow \text{FIND} \quad \text{PARAMS}(f_{C_u}, \boldsymbol{c}_u, f_{C_w}, \boldsymbol{c}_w, f_v)$ 16:17:end if 18:end if  $C_v \leftarrow \{v\} \cup C_u \cup C_w$ 19: end if 20:21: end for 22:function is VALID SIZE(c) return imsize  $(c) \leq p$  and  $||c_v||_2 \leq l$ 23:24: end function

We introduce two algorithms for the cone composition function. The objective of these functions is to identify a linear combination that represents the logic function of the merged cone, represented by  $\{v\} \cup C_u \cup C_w$ . Let  $f(\boldsymbol{x} || \boldsymbol{y}) =$  $f_v(f_{C_u}(\boldsymbol{x}), f_{C_w}(\boldsymbol{y}))$  denote the Boolean function of the merged cone. The composition algorithms return a vector of coefficients,  $\boldsymbol{c}$ , such that  $\phi_c$  is a valid

linear combination for the function f. It should be noted that these functions are agnostic about the FBS parameters and can return a vector c whose image size or Euclidean norm is larger than p or l, respectively.

The first function is illustrated in Algorithm 2 and it uses a naive approach. A scaled version of coefficient vector  $c_u$  is concatenated with vector  $c_w$  (see line 3). The coefficient vector  $c_u$  is scaled by the image size imsize  $(c_w)$  of the second vector. The output linear combination function is:

imsize 
$$(\boldsymbol{c}_w) \cdot \phi_{\boldsymbol{c}_w}(\boldsymbol{x}) + \phi_{\boldsymbol{c}_w}(\boldsymbol{y})$$

for  $\boldsymbol{x} \in \mathbb{Z}_2^{|\boldsymbol{c}_u|}$  and  $\boldsymbol{y} \in \mathbb{Z}_2^{|\boldsymbol{c}_w|}$ .

| Al | gorithm 2 Naive cone composition                                                                                        |
|----|-------------------------------------------------------------------------------------------------------------------------|
| 1: | function FIND_PARAMS_NAIVE $(f_{C_u}, \boldsymbol{c}_u, f_{C_w}, \boldsymbol{c}_w, \cdot)$                              |
| 2: | $a, b \leftarrow \text{imsize}(\boldsymbol{c}_w), 1$                                                                    |
| 3: | $\boldsymbol{c}_v \leftarrow \begin{bmatrix} a \cdot \boldsymbol{c}_u \parallel b \cdot \boldsymbol{c}_w \end{bmatrix}$ |
| 4: | $\mathbf{return}  \boldsymbol{c}_v$                                                                                     |
| 5: | end function                                                                                                            |

The second cone composition function, outlined in Algorithm 3, also concatenates scaled versions of the vectors  $c_u$  and  $c_w$  (line 5). In contrast to previous composition function, this function exhaustively searches for scaling coefficients a and b and returns the coefficient vector with the smallest image size. The output linear combination function is:

$$a \cdot \phi_{\boldsymbol{c}_{w}}(\boldsymbol{x}) + b \cdot \phi_{\boldsymbol{c}_{w}}(\boldsymbol{y})$$

for  $\boldsymbol{x} \in \mathbb{Z}_2^{|\boldsymbol{c}_u|}$  and  $\boldsymbol{y} \in \mathbb{Z}_2^{|\boldsymbol{c}_w|}$ . The possible ranges for these coefficients are  $|a| \leq$ imsize  $(\boldsymbol{c}_w)$  and  $|b| \leq$ imsize  $(\boldsymbol{c}_u)$ . Since the linear combination functions  $\phi_{\boldsymbol{c}}$  and  $\phi_{-\boldsymbol{c}}$  are equivalent, only positive values for the coefficient a are considered. This effectively reduces the search space by 2.

Cone composition example. Let us consider a node, v, with a logic function  $f_v(x,y) = x$  and y. Additionally, we assume that the gate predecessor cones are empty  $(C_x = C_y = \{\})$ . The functions of the predecessors are identities, we have  $f_{C_x}(x) = x$  and  $f_{C_y}(y) = y$ . Furthermore, the coefficients vectors are equal,  $c_x = c_y = [1]$ . The naive composition function returns the coefficients  $c^{naive} = [2, 1]$ , where 2 is the image size of  $c_x$ , and the search composition function returns  $c^{search} = [1, 1]$ . The image size of  $c^{naive}$  is 4, whereas the image size of  $c_y$  are given following table:

Algorithm 3 Search cone composition

1: function FIND\_PARAMS\_SEARCH $(f_{C_u}, c_u, f_{C_w}, c_w, f_v)$  $c_v^{\min} \leftarrow \emptyset$ 2: 3: for all  $a = 1, \ldots$ , imsize  $(c_w)$  do for all  $b = -imsize(c_u), \ldots, -1, 1, \ldots, imsize(c_u)$  do 4: 5: $\boldsymbol{c}_v \leftarrow \begin{bmatrix} \boldsymbol{a} \cdot \boldsymbol{c}_u \parallel \boldsymbol{b} \cdot \boldsymbol{c}_w \end{bmatrix}$ Let  $f(\mathbf{x} || \mathbf{y}) \stackrel{"}{=} f_v(f_{C_u}(\mathbf{x}), f_{C_w}(\mathbf{y}))$   $\triangleright f$  is the logic function of cone  $C_u \cup C_w \cup \{v\}$ 6: if is VALID $(f, c_v)$  then 7:if imsize  $(c_v) \leq \text{imsize} (c_v^{\min})$  and  $\|c_v\|_2 < \|c_v^{\min}\|_2$  then 8:  $oldsymbol{c}_v^{\min} \leftarrow oldsymbol{c}_v$ 9: 10: end if end if 11: end for 12:13:end for return  $c_v^{\min}$ 14: 15: end function 16: function is valid(f, c)  $V_{0} \leftarrow \left\{ \phi_{\boldsymbol{c}} \left( \boldsymbol{x} \right) \mid \boldsymbol{x} \in \mathbb{Z}_{2}^{|\boldsymbol{c}|}, f\left( \boldsymbol{x} \right) = 0 \right\}$  $V_{1} \leftarrow \left\{ \phi_{\boldsymbol{c}} \left( \boldsymbol{x} \right) \mid \boldsymbol{x} \in \mathbb{Z}_{2}^{|\boldsymbol{c}|}, f\left( \boldsymbol{x} \right) = 1 \right\}$  $\operatorname{return} V_{0} \cap V_{1} \equiv \emptyset$ 17:18: 19:20: end function

| $\boldsymbol{x}$ | $f_v(\boldsymbol{x})$ | $\phi_{\boldsymbol{c}^{naive}}(\boldsymbol{x})$ | $\phi_{\pmb{c}^{search}}(\pmb{x})$ |
|------------------|-----------------------|-------------------------------------------------|------------------------------------|
| [0, 0]           | 0                     | 0                                               | 0                                  |
| [0, 1]           | 0                     | 1                                               | 1                                  |
| [1, 0]           | 0                     | 2                                               | 1                                  |
| [1,1]            | 1                     | 3                                               | 2                                  |

It can be observed that the functions  $\phi_{\mathbf{c}^{naive}}$  and  $\phi_{\mathbf{c}^{search}}$  are valid linear combinations for the node v logic function as the corresponding linear combination values for  $f_v(x) \equiv 0$  and  $f_v(x) \equiv 1$  are different.

Implementation details. The algorithm listings have been simplified by omitting several implementation details. Gates with a single-input f(v), same-input gates f(v, v), and constant f(v) = cst gates are ignored because the same linear combination  $c_v$  of gate input is valid for gate output also. Furthermore, linear combinations with image sizes larger than  $2^{16}$  are pruned by default.

# 4 Implementation and performance

A proof of concept of the proposed circuit mapping heuristic has been implemented in python<sup>1</sup>. The algorithm is parameterised by the two cone composition

<sup>&</sup>lt;sup>1</sup> https://github.com/ssmiler/tfhe\_fbs\_map

methods that were previously presented. The mapping heuristics are referred to as either as the *naive heuristic* or as the *search heuristic*. A minor discrepancy between Algorithm 1 and the implemented version is that the latter does not consider the Euclidean norm l when evaluating the validity of a linear combination (function IS\_VALID\_SIZE). This was done because the Euclidean norm exerts a lesser influence on FBS parameters when compared to the number of plaintext divisions p.

In order to assess the efficacy of the proposed heuristic, we have employed Boolean circuits from a range of benchmark suites, as well as circuits generated manually. Both heuristics have been executed on each circuit benchmark for varying values of the FBS parameter p. For each execution, the elapsed time has been recorded, along with the characteristics of the mapped circuit, including the number of bootstrappings, linear combinations, the maximal Euclidean norm of the linear combinations, and so forth.

The experimental results demonstrate that the number of output bootstrappings does not exhibit a monotonically decreasing trend with an increase in the value of p. To illustrate, the blue line in Figure 2 depicts the output bootstrap count as a function of the FBS size, p, for the search heuristic applied to the AES 128 circuit. A similar phenomenon is observed for the naive heuristic. We think this phenomenon occurs due to the greedy nature of the heuristics, which aim to maximise the image size of linear combinations and consequently FBS are added too late by the heuristic. In the conducted tests, a maximal value P was assigned for the FBS size. Subsequently, the heuristic was executed for each value of  $2 \le p \le P$ . The mapped circuit with the lowest metric (either evaluation cost or bootstrapping count) value is kept as output.

The evaluation cost of a circuit is estimated as the number of circuit bootstrappings multiplied by the cost of a single bootstrapping. The concrete<sup>2</sup> compiler is used to estimate the execution cost of a single multi-input FBS with given parameters. In order to facilitate the utilisation of non power-of-two values for the FBS size and Euclidean norm, the compiler code has been patched. As an example, the red line in Figure 2 illustrates the evaluation cost of the AES circuit as a function of FBS size. It is important to note that while the number of bootstrappings may continue to decrease with FBS size, the evaluation cost of the circuit will consistently increase for p > 6. Furthermore, starting from p = 8, the evaluation cost will exceed that of the non-optimised circuit (i.e. the mapped circuit for p = 2). This is due to the fact that the reduction in bootstrapping count is no longer sufficient to compensate for the increase of a FBS cost.

### 4.1 EPFL combinational benchmark suite

The EPFL combinational benchmark suite [1] comprises 10 arithmetic, 10 random/control circuits and 3 multi-million gate designs. In our experiments, we have used the first two types of benchmarks, a total of 20 Boolean circuits. The heuristics we have introduced are used to map each benchmark circuit to FBS.

<sup>&</sup>lt;sup>2</sup> https://github.com/zama-ai/concrete



Fig. 2. Search heuristic applied to AES circuit. Mapped circuit number of bootstrappings and estimated evaluation cost as a function of FBS size.

The mapping heuristic is executed with FBS sizes varying from 2 to 15. The mapped circuit with the smallest cost and smallest the FBS size in case of a tie is kept as the output. Furthermore, we estimate the cost of executing a *reference mapping* where each Boolean circuit 2-input gate is executed as a TFHE gate bootstrapping (i.e. FBS with size 2).

Refer to Table 1 for a comparison of the evaluation speedups of the circuits mapped with the proposed heuristics compared to the reference mapping (column "cost"). The two column groups (denoted as "naive" and "search") represent the two cone composition functions. Furthermore, two additional columns are given for the mapping with the lowest execution cost:

- "#boots." difference in number of bootstrappings.
- "FBS size" FBS size and linear combination image size (in brackets) for which this mapping was obtained. We remind that the linear combination image size can be larger than the FBS size in the case of negacyclic functions.

The last table row represents the average of the respective columns.

The mapping heuristic with search cone composition consistently gives better execution cost and a lower bootstrapping number when compared to the naive cone composition. On average, the search heuristic results in 27% reduction in execution cost and a 45% reduction in the number of bootstrappings. The functional bootstrapping size, which yields the lowest execution cost, is always smaller or equal to 8. This aligns with the earlier observation made for the AES circuit.

| honoh      | naive                 |         |          | search |          |          |
|------------|-----------------------|---------|----------|--------|----------|----------|
| bench      | $\operatorname{cost}$ | #boots. | FBS size | cost   | # boots. | FBS size |
| adder      |                       |         | 2(4)     | -15%   | -25%     | 3(5)     |
| bar        |                       |         | 2(4)     | -24%   | -50%     | 7(10)    |
| div        |                       |         | 2(4)     | -23%   | -50%     | 7(13)    |
| hyp        |                       |         | 2(4)     | -21%   | -49%     | 7(13)    |
| log2       |                       |         | 2(4)     | -25%   | -34%     | 3(5)     |
| max        | -1%                   | -38%    | 8 (8)    | -29%   | -38%     | 3(5)     |
| multiplier |                       |         | 2(4)     | -21%   | -30%     | 3(5)     |
| sin        |                       |         | 2(4)     | -24%   | -51%     | 7 (13)   |
| sqrt       |                       |         | 2(4)     | -20%   | -49%     | 7 (13)   |
| square     |                       |         | 2(4)     | -19%   | -29%     | 3(5)     |
| arbiter    | -17%                  | -45%    | 7 (8)    | -48%   | -64%     | 5 (8)    |
| cavlc      |                       |         | 2(4)     | -35%   | -60%     | 8 (13)   |
| ctrl       | -6%                   | -38%    | 7(8)     | -31%   | -57%     | 8 (13)   |
| dec        |                       |         | 2(4)     |        |          | 2(3)     |
| i2c        |                       |         | 2(4)     | -32%   | -56%     | 7(13)    |
| int2float  | -7%                   | -38%    | 7(8)     | -46%   | -67%     | 8 (13)   |
| mem ctrl   | -3%                   | -36%    | 7(8)     | -30%   | -38%     | 3(5)     |
| priority   |                       |         | 2(4)     | -32%   | -58%     | 8 (13)   |
| router     |                       |         | 2(4)     | -40%   | -61%     | 7(12)    |
| voter      |                       |         | 2(4)     | -21%   | -31%     | 3(5)     |
| avg.       | -2%                   | -10%    |          | -27%   | -45%     |          |

avg.  $\left| -2\% -10\% \right| -27\% -45\%$ **Table 1.** EPFL benchmark. Evaluation cost improvements for FBS mapped circuits (column "cost"), decrease in bootstrappings count (column "#boots.") and corresponding FBS sizes. Cells where no gain has been obtained are not included in the presentation.



Fig. 3. This figure represents the average time taken to execute mapping heuristics, divided by the number of circuit nodes.

Figure 3 illustrates the average execution time of the heuristics, divided by the input circuit gate count, for each FBS size. As anticipated, the search heuristic is slower than the naive one and scales non-linearly with FBS size due to the exhaustive search in Algorithm 3.

### 4.2 Trivium and Kreyvium stream ciphers

The paper [2] presents hand-optimised implementations for Trivium/Kreyvium stream ciphers [14,8] using TFHE FBS. The authors propose a number of potential approaches to implementing a single iteration of stream ciphers, beginning with gate bootstrapping and concluding with functional bootstrapping. The most effective solution they have found utilises a 2-bit message space (or 3-bit in case of negacyclic functions).

The proposed heuristics process circuit gates in the order in which they appear in the input circuit file. This is just one of the many possible topological orders and it has an impact on the quality of the mapped circuit. Two versions of each stream cipher have been implemented, resulting in a total of four circuits. The code used to generate these circuits is given in Listing 1. The difference between the two versions is in how the last 3 instructions are expressed. The second version groups together the xor gates when variables out\_t1, out\_t2 and out\_t3 are computed.

The search heuristic has been applied to the four circuits with FBS sizes varying from 2 to 10. Figure 4 and Figure 5 plot the bootstrapping count and the

```
// version 1
t1 = s66 ^ s93
t2 = s162 ^ s177
t3 = s243 ^ s288 ^ k127
out = t1 ^ t2 ^ t3
out_t1 = t1 ^ (s91 & s92) ^ s171 ^ iv127
out_t2 = t2 ^ (s175 & s176) ^ s264
out_t3 = t3 ^ (s286 & s287) ^ s69
// version 2
t1 = s66 ^ s93
t2 = s162 ^ s177
t3 = s243 ^ s288 ^ k127
out = t1 ^ t2 ^ t3
out_t1 = (t1 ^ s171 ^ iv127) ^ (s91 & s92)
out_t2 = (t2 ^ s264) ^ (s175 & s176)
out_t3 = (t3 ^ s69) ^ (s286 & s287)
```

Listing 1: The two versions of an iteration of Trivium, underlined are Kreyvium differences. The circuits have 15 inputs, respectively 17 for Kreyvium, and 4 outputs (out, out\_t1, out\_t2 and out\_t3).

17

estimated evaluation cost for the mapped circuits. The second version demonstrates a faster convergence rate and a lower number of required bootstrappings, with the exception of p = 7. The minimal number of bootstrappings for these circuits is four, which is the number of outputs. This is reached at p = 7 for Trivium and at p = 9 for Kreyvium.



Fig. 4. Trivium stream cipher. Bootstrapping count and evaluation cost for the 2 circuit versions. The result from [2] is shown with red dots.

Our heuristic maps the Trivium circuit to the same number of bootstrappings (8) and the Kreyvium circuit to 20% less bootstrappings (8 instead of 10) using the same message space  $\mathbb{Z}_4$  as in [2]. Furthermore, a solution with the same number of bootstrappings is found for p = 3. The evaluation cost is reduced in this case due to the smaller TFHE FBS parameters.

The mapped Trivium circuit with the lowest evaluation cost is obtained with p = 7 (4 bootstrappings) and with p = 6 (5 bootstrappings) for the Kreyvium circuit. The evaluation cost is 44 - 45% less than that of the solutions presented in [2]. One significant advantage of the heuristic proposed in this paper over [2] is that circuits are automatically mapped.

Listing Listing 2 illustrates version 2 of ciphers Listing 1 which have been mapped to FBS with size p = 3. The mapped circuit has 8 bootstrappings and is not dependent on the implemented stream cipher. The first and the last 4 bootstrappings are independent of each other. In the scenario of parallel execution, the mapped circuit has a latency equivalent of 2 bootstrappings. When compared to [2], the Kreyvium latency is reduced by 50% (from 3 to 2) by our heuristic.



Fig. 5. Kreyvium stream cipher. Bootstrapping count and evaluation cost for the 2 circuit versions. The result from [2] is shown with red dots.

```
m1 = 2 - s66 + s93 - s162 + s177
m2 = Bootstrap(m1, [0, 1, 0, 1, 0])
m3 = 1 - s66 + s93 + s171 + iv127
m4 = Bootstrap(m3, [1, 0, 1, 0, 1])
m5 = 1 - s162 + s177 + s264
m6 = Bootstrap(m5, [1, 0, 1, 0])
m7 = 1 - s243 + s288 + k127 + s69
m8 = Bootstrap(m7, [1, 0, 1, 0, 1])
m9 = 1 + m2 - s243 + s288 + k127
out = Bootstrap(m9, [1, 0, 1, 0, 1])
m10 = 3 * m4 + s91 + s92
out_t1 = Bootstrap(m10, [0, 0, 1, 1, 1, 0])
m11 = 3 * m6 + s175 + s176
out_t2 = Bootstrap(m11, [0, 0, 1, 1, 1, 0])
m12 = 3 * m8 + s286 + s287
out_t3 = Bootstrap(m12, [0, 0, 1, 1, 1, 0])
```

Listing 2: Version 2 of Listing 1 which was mapped to FBS with size p = 3.

A fast heuristic for mapping Boolean circuits to functional bootstrapping

### 4.3 Comparison to AutoHoG

In paper [20] the authors proposed a circuit mapping procedure, designated AutoHoG, for TFHE functional bootstrapping. AutoHoG takes a Boolean circuit as input and generates a circuit with "compound" gates, which is similar to our FBS mapping. In addition to single-output gates, the AutoHoG authors employ the multi-output evaluation method from [9] to factor out bootstrappings with the same inputs.

In their experiments, the authors use the ISCAS'85 circuit benchmark [7] and the ISCAS'89 sequential circuit benchmark [6] in their experiments. We use the same techniques to transform ISCAS'89 sequential circuits with flip-flops into combinational circuits. The sequential circuits are unrolled for 10 clock cycles using the ABC logic synthesis tool [3] (command frames -F 10 -i). Both ISCAS'85 and ISCAS'89 (after unrolling) are mapped to 2-input gates using a complete gate library that has been generated manually.

In AutoHoG, a single parameterisation of TFHE is employed, enabling the evaluation of a multi-output FBS with p = 32. The authors compare the execution times of mapped circuits with those of input circuits using the same TFHE parameters. However, the presented speedup results may be overly optimistic, given that the input circuit can be executed with significantly smaller TFHE parameters.

To ensure a fair comparison with AutoHoG, we execute our mapping heuristic with FBS size varying from 2 to 32 and keep the circuit with smallest number of bootstrappings as output. The speedup is approximated as the ratio between the number of input circuit gates and the number of FBSs in the mapped circuit. This is equivalent to the method used by AutoHoG to compute speedup. In this approximation, the overhead due to linear combinations in multi-input FBS evaluation is ignored. The linear combination has a much smaller impact on execution time than the FBS part.

Table 2 and Table 3 depict the speedups of the search heuristic (columns "naive" and "search") and the speedup from AutoHoG paper. As previously, we provide the FBS size (column "FBS size") for which the circuit with the fewest number of bootstrappings is obtained. Observe that the FBS size is not always maximum value 32. This indicates that fixing the FBS size in advance is not always advantageous. The speedup of AutoHoG results has been computed by dividing the available numeric values in [20] (Fig. 7 and TABLE IV). AutoHoG demonstrates enhanced speedups across the majority of benchmarks, as anticipated given that our approach does not employ the multi-output FBS technique.

### References

- Amarú, L., Gaillardon, P.E., De Micheli, G.: The epfl combinational benchmark suite. In: Proceedings of the 24th International Workshop on Logic & Synthesis (IWLS) (2015)
- Balenbois, T., Orfila, J.B., Smart, N.: Trivial transciphering with trivium and tfhe. In: Proceedings of the 11th Workshop on Encrypted Computing & Applied Homomorphic Cryptography. pp. 69–78 (2023)

| bench | sea<br>speedup | AutoHoG<br>speedup |               |
|-------|----------------|--------------------|---------------|
| c17   | 3.00	imes      | 19(29)             | $2.50 \times$ |
| c432  | 3.05	imes      | 31(49)             | $2.16 \times$ |
| c499  | $3.71 \times$  | 22(27)             | -             |
| c880  | $2.85 \times$  | 16(24)             | _             |
| c1355 | $3.71 \times$  | 22(27)             | 6.03 	imes    |
| c1908 | $2.40 \times$  | 30(36)             | _             |
| c2670 | $3.26 \times$  | 29(45)             | _             |
| c3540 | $2.54 \times$  | 17(32)             | 3.90 	imes    |
| c5315 | $3.05 \times$  | 25(39)             | -             |
| c6288 | $2.40 \times$  | 27(31)             | -             |
| c7552 | $2.42 \times$  | 21(42)             | 5.68	imes     |
| avg.  | $2.94 \times$  |                    | 4.05	imes     |

**Table 2.** ISCAS'85 speedup and comparison with AutoHoG. The best benchmark-wise speedup is presented in bold, with the exception of benchmarks for which an AutoHoG speedup is not available.

- 3. Berkelev Logic Synthesis and Verification Group: ABC: А Svs-Synthesis Verification, tem for Sequential and  $\operatorname{commit}$ 2d70deb. http://www.eecs.berkeley.edu/~alanmi/abc/, http://www.eecs.berkeley.edu
- Biasse, J.F., Ruiz, L.: Fhew with efficient multibit bootstrapping. In: Progress in Cryptology–LATINCRYPT 2015: 4th International Conference on Cryptology and Information Security in Latin America, Guadalajara, Mexico, August 23-26, 2015, Proceedings 4. pp. 119–135. Springer (2015)
- Brakerski, Z., Gentry, C., Vaikuntanathan, V.: (leveled) fully homomorphic encryption without bootstrapping. ACM Transactions on Computation Theory (TOCT) 6(3), 1–36 (2014)
- Brglez, F., Bryan, D., Kozminski, K.: Combinational profiles of sequential benchmark circuits. In: 1989 IEEE International Symposium on Circuits and Systems (ISCAS). pp. 1929–1934. IEEE (1989)
- 7. Brglez, F., Fujiwara, H.: A neutral netlist of 10 combinational benchmark circuits and a target translator. In: Fortran. ISCAS'85 (1985)
- Canteaut, A., Carpov, S., Fontaine, C., Lepoint, T., Naya-Plasencia, M., Paillier, P., Sirdey, R.: Stream ciphers: A practical solution for efficient homomorphicciphertext compression. Journal of Cryptology **31**(3), 885–916 (2018)
- Carpov, S., Izabachène, M., Mollimard, V.: New techniques for multi-value input homomorphic evaluation and applications. In: Topics in Cryptology–CT-RSA 2019: The Cryptographers' Track at the RSA Conference 2019, San Francisco, CA, USA, March 4–8, 2019, Proceedings. pp. 106–126. Springer (2019)
- Chillotti, I., Gama, N., Georgieva, M., Izabachene, M.: Faster fully homomorphic encryption: Bootstrapping in less than 0.1 seconds. In: Advances in Cryptology– ASIACRYPT 2016: 22nd International Conference on the Theory and Application of Cryptology and Information Security, Hanoi, Vietnam, December 4-8, 2016, Proceedings, Part I 22. pp. 3–33. Springer (2016)
- Chillotti, I., Gama, N., Georgieva, M., Izabachène, M.: Tfhe: fast fully homomorphic encryption over the torus. Journal of Cryptology 33(1), 34–91 (2020)

| honeh  | sea              | AutoHoG  |                  |
|--------|------------------|----------|------------------|
| Dench  | speedup          | FBS size | speedup          |
| s27    | 3.39×            | 31 (39)  | $1.27 \times$    |
| s208   | $2.57 \times$    | 25(46)   | —                |
| s298   | $2.84 \times$    | 30(49)   | <b>3.43</b> imes |
| s344   | $2.75 \times$    | 31(50)   | 3.05	imes        |
| s349   | $2.73 \times$    | 31(53)   | <b>2.79</b> imes |
| s382   | $3.14 \times$    | 20(35)   | 4.46 	imes       |
| s386   | $2.88 \times$    | 23(41)   | <b>5.85</b> imes |
| s400   | $3.39 \times$    | 31 (55)  | 4.73	imes        |
| s420   | $2.67 \times$    | 26(46)   | <b>2.94</b> imes |
| s444   | $2.75 \times$    | 31(46)   | 4.73	imes        |
| s510   | $2.75 \times$    | 28(45)   | <b>3.43</b> imes |
| s526   | $2.78 \times$    | 32(55)   | 4.19	imes        |
| s641   | <b>2.61</b> imes | 20(29)   | $2.14 \times$    |
| s713   | <b>2.94</b> imes | 20(27)   | $2.45 \times$    |
| s820   | $4.24 \times$    | 27(53)   | 4.75	imes        |
| s832   | $4.22\times$     | 27(53)   | 4.79	imes        |
| s838   | $2.65 \times$    | 28(49)   | 3.01	imes        |
| s953   | $2.65 \times$    | 30(55)   | <b>3.51</b> imes |
| s1196  | $3.35 \times$    | 31(49)   | 4.15	imes        |
| s1238  | $3.39 \times$    | 32(52)   | 3.66 	imes       |
| s1423  | $2.31 \times$    | 18(32)   | 3.01	imes        |
| s1488  | $3.80 \times$    | 32(58)   | 7.45	imes        |
| s1494  | $3.39 \times$    | 31 (55)  | —                |
| s5378  | $2.60 \times$    | 30(56)   | <b>7.35</b> imes |
| s9234  | $2.38 \times$    | 24(44)   | 3.60	imes        |
| s13207 | <b>2.56</b> imes | 32(61)   | $2.33 \times$    |
| s15850 | <b>2.40</b> imes | 26(49)   | $2.22 \times$    |
| s35932 | $2.69 \times$    | 25(44)   | 3.19	imes        |
| s38417 | $2.75 \times$    | 29(55)   | —                |
| s38584 | 2.62 	imes       | 32(58)   | $2.51 \times$    |
| avg.   | $2.94 \times$    |          | <b>3.74</b> imes |

Table 3. ISCAS'89 speedup and comparison with AutoHoG. The best benchmark-wise speedup is presented in bold, with the exception of benchmarks for which an AutoHoG speedup is not available.

- 22 Sergiu Carpov
- Chillotti, I., Gama, N., Georgieva, M., Izabachène, M.: TFHE: Fast fully homomorphic encryption library (August 2016), https://tfhe.github.io/tfhe/
- Clet, P.E., Boudguiga, A., Sirdey, R.: Chocobo: Creating homomorphic circuit operating with functional bootstrapping in basis b. Cryptology ePrint Archive (2024)
- De Canniere, C.: Trivium: A stream cipher construction inspired by block cipher design principles. In: International Conference on Information Security. pp. 171– 186. Springer (2006)
- Ducas, L., Micciancio, D.: Fhew: bootstrapping homomorphic encryption in less than a second. In: Annual international conference on the theory and applications of cryptographic techniques. pp. 617–640. Springer (2015)
- 16. Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption. Cryptology ePrint Archive (2012)
- Gentry, C.: Fully homomorphic encryption using ideal lattices. In: Proceedings of the forty-first annual ACM symposium on Theory of computing. pp. 169–178 (2009)
- Gentry, C., Sahai, A., Waters, B.: Homomorphic encryption from learning with errors: Conceptually-simpler, asymptotically-faster, attribute-based. In: Advances in Cryptology–CRYPTO 2013: 33rd Annual Cryptology Conference, Santa Barbara, CA, USA, August 18-22, 2013. Proceedings, Part I. pp. 75–92. Springer (2013)
- 19. Gouert, C., Mouris, D., Tsoutsos, N.G.: Helm: Navigating homomorphic encryption through gates and lookup tables. Cryptology ePrint Archive (2023)
- Guan, Z., Mao, R., Zhang, Q., Zhang, Z., Zhao, Z., Bian, S.: Autohog: Automating homomorphic gate design for large-scale logic circuit evaluation. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2024)
- Matsuoka, K., Hoshizuki, Y., Sato, T., Bian, S.: Towards better standard cell library: Optimizing compound logic gates for tfhe. In: Proceedings of the 9th on Workshop on Encrypted Computing & Applied Homomorphic Cryptography. pp. 63–68 (2021)
- 22. Mono, J., Kluczniak, K., Güneysu, T.: Improved circuit synthesis with amortized bootstrapping for fhew-like schemes. Cryptology ePrint Archive (2023)
- Soeken, M., Riener, H., Haaswijk, W., Testa, E., Schmitt, B., Meuli, G., Mozafari, F., Lee, S.Y., Tempia Calvino, A., Marakkalage, Dewmini Sudara De Micheli, G.: The EPFL logic synthesis libraries (Jun 2022), arXiv:1805.05121v3
- Trama, D., Clet, P.E., Boudguiga, A., Sirdey, R.: A homomorphic aes evaluation in less than 30 seconds by means of the. In: Proceedings of the 11th Workshop on Encrypted Computing & Applied Homomorphic Cryptography. pp. 79–90 (2023)