Lookups on zkVMs

# Lookups on zkVMs Lookup arguments are a standard way to prove that values used in a computation belong to a reference table. This post grew out of integrating concrete zkVMs into soundcalc, a tool created for the [Ethproofs](https://ethproofs.org/) initiative to [estimate the soundness of zkVMs](https://blog.ethereum.org/2025/12/18/zkevm-security-foundations) designed to prove the execution of Ethereum blocks. It gives a high-level picture of what lookups prove, where they appear in zkVM designs, how the common algebraic formulations work, how to measure their soundness, and which variants appear in the systems we examined. ## What is a lookup? A lookup argument proves that every value in one table also appears in another table. A useful way to think about it is: there is a table of valid values, and the prover wants to convince the verifier that some values used by the computation all appear in that table. The verifier does not want to read the whole table or check every value one by one, so the lookup argument compresses that membership check into a succinct proof. For example: * If the table contains all values from $0$ to $2^{16}-1$, then a lookup proves that a value is 16-bit. * If the table contains valid input-output tuples for an operation, then a lookup proves that the operation was evaluated correctly. * If the table contains valid memory events, then a lookup can help prove that a read used the value written at the corresponding address and timestamp. An important point is that lookup arguments usually ignore order and allow repeated queries. What matters is that the prover cannot introduce a value that was not present in the reference table. ## Where lookups are used in zkVMs Lookups are useful in zkVMs because many VM checks are not naturally expressed as low-degree algebraic constraints, which are the language used by proof systems. Instead of expressing every operation directly as constraints, the prover can reduce them to table relations: a value belongs to a range, an instruction tuple is valid, or two multisets of events are consistent. ### Range checks Range checks are the simplest example. The table $\mathbf T$ contains all values in some range, such as $[0, 2^{16})$, and the prover shows that every value in another table $\mathbf L$ belongs to that range. This is useful because VM words are usually decomposed into smaller limbs. A proof system over a large field may need to show that each limb is actually a byte, a 16-bit value, or a 32-bit value. Lookups make this cheap and uniform. ### Input-output relations Many instructions can be represented by a table of valid input-output tuples. Suppose $\mathbf T$ has $4$ columns, and each row contains inputs together with the correct output of some function $F$: $$\mathbf{T}_j=(a,b,c,F(a,b,c)).$$ Then, to prove that $F(a,b,c)=d$, the prover can show that the tuple $(a,b,c,d)$ appears in $\mathbf T$. This lets the zkVM treat the operation as a precomputed relation. Bitwise operations, comparisons, shifts, and other CPU-like instructions can be checked by looking up the corresponding row instead of rebuilding the full instruction logic inside the main trace. ### Buses and chips Many zkVMs are organized into **chips**. A chip is a trace dedicated to some specific computation, such as an ALU operation, a hash function, or a memory operation. These chips need to communicate with each other and with the main execution trace. A common way to model this communication is through **buses**. One part of the system sends values, and another part receives them. The bus is balanced when the multiset of sends equals the multiset of receives. The same algebraic tools used for lookups can be adapted to prove that no message was created, lost, or duplicated. ### Memory consistency Some zkVMs execute the block, commit to the entire memory trace in sorted order, and prove that this sorted trace is a permutation of the execution trace. The permutation argument guarantees that the sorted view and execution view contain the same memory events. Another approach handles memory dynamically using a bus-style argument. Reads and writes contribute with opposite signs, and every memory operation is represented as a read followed by a write. For an actual read, the written value is the same as the value that was read, so the memory state is preserved. For an actual write, the operation removes the old value and inserts the new one. The initial memory state is included with positive terms, the final state with negative terms, and each read/write operation is represented as a message. If the bus is balanced, then the claimed final state is consistent with the executed memory operations. ## Lookups in the wild In concrete zkVMs, lookup arguments show up under different names depending on what relation is being checked. In [soundcalc](https://github.com/ethereum/soundcalc), examples include: * **Input-output tables:** Poseidon2 and Keccak tables, where the lookup checks that the claimed output matches the inputs. * **Range checks:** 16-bit and 19-bit range-check tables, byte tables, and other fixed-range tables. * **Instruction tables:** ALU, decoder, program, and syscall tables, where the lookup checks valid VM behavior. * **Memory arguments:** memory lookups that check consistency between reads, writes, and the claimed memory state. * **Bus arguments:** bus or generic interaction lookups, where the goal is to prove that sent and received messages balance. * **Permutation and connection arguments:** permutation or connection checks, such as the `Permutation_gsum` and `Connection_gprod` entries used in ZisK-style reports. * **Aggregated generic lookups:** Some systems such as SP1 and OpenVM may expose a single large `lookup` component that aggregates many lookup relations internally. ## Algebraic formulations A lookup argument proves the following relation: Given two tables $\mathbf{T}\in\mathbb{F}^T$ and $\mathbf{L}\in\mathbb{F}^L$, for all $i\in[L]$ there exists $j\in[T]$ such that $\mathbf{L}_i=\mathbf{T}_j$. That is, all the elements in $\mathbf{L}$ are also elements in $\mathbf{T}$. They can be repeated and do not need to respect the ordering, but there is no element in $\mathbf{L}$ that does not appear in $\mathbf{T}$. There are different algebraic expressions that capture this relation. ### Selector matrix $$\exists \ \mathbf{M}\in\mathbb{F}^{L\times T} \ s.t. \ \mathbf{M}\mathbf{T}=\mathbf{L} \ \land \ \forall \ i\in[L] \exists! j\in[T] \ s.t. \mathbf{M}_{ij}\neq 0 \land \mathbf{M}_{ij}=1 \qquad (1)$$ Here, $\mathbf M$ is a selector matrix. Each row has exactly one non-zero entry, and that entry is $1$. This makes the lookup relation very explicit: every element of $\mathbf L$ is obtained by selecting an element of $\mathbf T$. ### Product formulation Also, we can use the fact that two polynomials have the same roots with the same multiplicity (and same leading coefficient) if and only if they are the same: $$\exists \vec m\in\{\mathbb{N}\cup 0\}^{T}, s.t. \prod_{i\in [L]}(X-\mathbf{L}_i)=\prod_{j\in [T]}(X-\mathbf{T}_j)^{m_j} \qquad (2)$$ In this case, the vector $\vec m$ records how many times each element of $\mathbf{T}$ is used to form $\mathbf{L}$. ### Partial fractions Finally, we can use partial fractions: two partial-fraction decompositions are the same if and only if they have the same poles with the same multiplicities. Since a rational function of the form $\frac{1}{X-a}$ has a pole at $a$, sums of these terms encode multisets through their poles: $$\exists \vec m\in\{\mathbb{N}\cup 0\}^{T}, s.t. \sum_{i\in [L]}\frac{1}{X-\mathbf{L}_i}=\sum_{j\in [T]}\frac{m_j}{X-\mathbf{T}_j} \qquad (3)$$ This is the same multiset idea, but expressed additively. That additive structure is one reason the relation is attractive in interactive proofs: sums are often easier to handle than products. #### Permutation arguments Permutation arguments are a special case of lookup arguments where $\mathbf L$ contains all the elements of $\mathbf T$, exactly once. In the selector formulation, if $T=L$, then $\mathbf M$ is a square matrix with one non-zero entry per row and column, and that entry is $1$. Thus, $\mathbf M$ is a permutation matrix. In the product and partial-fraction formulations, setting $\vec m=\vec 1$ proves that $\mathbf T$ and $\mathbf L$ are permutations of each other. #### Bus balancing Buses can also be expressed with the partial-fraction formulation. Define a table $\mathbf{\hat T}\in\mathbb F^{T+L}$ containing both sends and receives, and let the signed multiplicity $m_j$ be positive for one side of the bus and negative for the other. The bus is balanced if: $$\sum_{j\in [L+T]}\frac{m_j}{X-\mathbf{\hat{T}}_j}=0 \qquad (4)$$ This says that the positive and negative occurrences cancel out as multisets. ### From relations to arguments A lookup argument usually consists on the following phases: * Commitment phase: to tables $\mathbf{T}$ and $\mathbf{L}$. These commitments can be done by an external parties or the parties involved in the protocol. In some settings, the commitment to $\mathbf{T}$ must be trusted for the verifier. * Proving phase: - Satisfiability: where the satisfiability of some of the algebraic relations stated aboved is proven in a succinct manner. - Consistency: proves that the algebraic expressions are consistent with the commitments to $\mathbf{T}$ and $\mathbf{L}$. That is, that the algebraic expressions where indeed built using elements from the committed tables. ## THE lookup: LogUp In soundcalc, we currently measure the soundness of [LogUp](https://eprint.iacr.org/2022/1530.pdf), which proves the partial-fraction identity from equation $(3)$. The point of LogUp is to reduce a lookup relation to a sum identity over committed tables. The prover commits to polynomial representations of the tables, and then proves that the partial-fraction sums match at a random verifier challenge. ### Commitment phase $\mathbf T$ and $\mathbf L$ are represented through commitments to polynomials that interpolate them. These polynomials can be univariate or multivariate. Given a basis $\mathcal B=\{B_j(\vec X)\}_{j=1}^T$ for a polynomial space of dimension $T$, we can build: $$T(\vec X)=\sum_{j=1}^T\mathbf{T}_jB_j(\vec X).$$ This polynomial uniquely represents $\mathbf T$ in that basis. A polynomial commitment scheme then gives the verifier a succinct commitment to the table. We usually take $\mathcal B$ to be the set of Lagrange interpolation polynomials over a set $H$ of size $T$. The prover also commits to the multiplicity vector $\vec m$ from equation $(3)$. In polynomial form, the lookup relation becomes: $$ \sum_{\vec x\in H}\frac{1}{X-\mathbf{L}(\vec x)}=\sum_{\vec x\in H}\frac{m(\vec x)}{X-\mathbf{T}(\vec x)}$$ ### Proving phase **Satisfiability.** The verifier sends a random element $\gamma$. The prover sends: $$\sum_{\vec x\in H}\frac{1}{\gamma-\mathbf{L}(\vec x)}$$ and $$\sum_{\vec x\in H}\frac{m(\vec x)}{\gamma-\mathbf{T}(\vec x)}.$$ The verifier checks: $$ \sum_{\vec x\in H}\frac{1}{\gamma-\mathbf{L}(\vec x)}=\sum_{\vec x\in H}\frac{m(\vec x)}{\gamma-\mathbf{T}(\vec x)} \qquad (5)$$ This is the probabilistic check for the partial-fraction identity. It does not, by itself, prove that the two sums were computed from the correct tables. **Consistency.** The prover must also show that the sums above were actually computed from the committed tables $\mathbf L$, $\mathbf T$, and $\vec m$. This is done with polynomial constraints. For example, consistency with $\mathbf T$ can be proven through: $$ \left(\sum_{\vec x\in H}\frac{m(\vec x)}{\gamma-\mathbf{T}(\vec x)}L_{\vec x}(\vec X)\right)\left(\prod_{\vec x\in H}(\gamma - \mathbf T(\vec x))\right)=\sum_{\vec x\in H} m(\vec x)\left(\prod_{\vec y\in H\setminus\{\vec x\}}(\gamma - \mathbf T(\vec y))\right)L_{\vec x}(\vec X) $$ The exact way this consistency proof is implemented depends on the surrounding proof system. ## Soundness For LogUp, the soundness analysis separates into satisfiability and consistency. There are two main ways a cheating prover can try to break the lookup argument. First, the prover can try to make a false lookup identity pass the verifier's random check. In LogUp, the verifier does not check equation $(3)$ as a full identity. Instead, it checks equation $(5)$ at a random point $\gamma$. If equation $(3)$ is false, the chance that the verifier samples a $\gamma\in\mathbb F$ where equation $(5)$ still holds is controlled by a Schwartz-Zippel style bound. In the basic setting, [this error is roughly $\frac{H}{F}$](https://github.com/openvm-org/stark-backend/blob/main/docs/Soundness_of_Interactions_via_LogUp.pdf), where $H$ is the relevant domain size and $F$ is the field size. Second, the prover can try to prove a true identity over values that are not the committed tables. This is the consistency problem. The prover must show that the algebraic expressions are consistent with the commitments to $\mathbf T$, $\mathbf L$, and $\vec m$. There are several cases: * If the consistency expression is compatible with the constraint system used by the broader SNARK, the constraints can be added directly to the rest of the system. Then the lookup contributes the satisfiability error $\epsilon_{SZ}$. * If the expression is not directly compatible, but we want to batch it with the rest of the system, we need a reduction to the chosen constraint system. This adds an error term $\epsilon_{reduction}$. * If the lookup is proven separately, then we must also account for the soundness of that proving system, denoted $\epsilon_{consistency}$. So the lookup soundness error is: $$ \epsilon = \begin{cases} \epsilon_{SZ} & \text{if consistency is enforced directly},\\ \epsilon_{reduction}+\epsilon_{SZ} & \text{if a reduction is needed},\\ \epsilon_{consistency}+\epsilon_{SZ} & \text{if consistency is proven separately}. \end{cases} $$ ## Practical soundness variants This section collects the practical variants that matter when measuring soundness in the lookup round of zkVMs. ### Multi-column tables In real systems, tables usually have several columns. For example, an instruction table may contain operands, flags, and outputs. For $\mathbf L\in\mathbb F^{L\times S}$ and $\mathbf T\in\mathbb F^{T\times S}$, columns are usually batched using a polynomial. The expression becomes: $$\sum_{i=1}^L \frac{1}{X- \sum_{s=1}^S \mathbf L_{is}B_s(\vec Y)}=\sum_{j=1}^T \frac{m_j}{X-\sum_{s=1}^S \mathbf T_{js}B_{s}(\vec Y)}$$ Since the denominator cannot keep symbolic column variables forever, the batching polynomial is evaluated at verifier randomness $\vec \beta\in\mathbb F^d$. This adds soundness error. A typical bound is: $$\epsilon_{SZ}\leq \frac{dH}{F}.$$ If the aggregation is univariate, then $d=S$. In the multivariate case, $d=\log(S)$. ### GKR LogUp As shown in the [LogUp-GKR paper](https://eprint.iacr.org/2023/1284.pdf), the consistency phase in multivariate LogUp can be proven efficiently using GKR. In that case, the lookup soundness must include the GKR consistency error $\epsilon_{GKR}$. ### Aggregating many lookups zkVMs often perform many lookups. These can be aggregated in different ways. If we look up many tables $\mathbf L_1,\ldots,\mathbf L_n$ against one table $\mathbf T$, we can either concatenate them into one table $$\mathbf L=(\mathbf L_1,\ldots,\mathbf L_n),$$ or combine them with random powers: $$\mathbf L=\mathbf L_1+\gamma \mathbf L_2+\ldots+\gamma^{n-1}\mathbf L_n.$$ In both cases, the soundness error grows with the number of aggregated terms. If there are $K$ relevant batches, we get an error of the form: $$\epsilon_{SZ}=\frac{KH}{F}.$$ For aggregating $\mathbf L_1^k,\ldots,\mathbf L_n^k$ at $\mathbf T^k$ for $k=1,\ldots,K$, the total error is the sum over batches: $$\epsilon_{SZ}=\sum_{k\in K}\epsilon_{SZ}^k.$$ ### Grinding Grinding, as explained in the [ethSTARK documentation](https://eprint.iacr.org/2021/582), can be used to reduce the effective soundness error of the lookup round inside the broader SNARK.