# Data availability encoding ## Reverse Bit Order [Reverse bit order](https://en.wikipedia.org/wiki/Bit-reversal_permutation) is the name for the order of a sequence of $n$ integers, that comes from reversing the bit sequence of each individual integer and then ordering by the resulting integer. It is an idempotent operation. For example, applying this to the integers $0,1,2,3,4,5,6,7$, results in the reverse bit order sequence $0,4,2,6,1,5,3,7$. Now say we have a root of unity $\omega^8=0$. Then the sequence $\omega^0,\omega^4,\omega^2,\omega^6,\omega^1,\omega^5,\omega^3,\omega^7$ is a permutation of the roots of unity of order 8 with the following nice properties: * Partitioning it into the first four and the second four elements, we get a subgroup of order 4 ($\omega^0,\omega^4,\omega^2$) and a coset of order 4 ($\omega^6,\omega^1,\omega^5,\omega^3,\omega^7$) * Partitioning it into four sets of size 2, we get cosets of the order 2 subgroup: ($\omega^0,\omega^4$) (which is also the subgroup), ($\omega^2,\omega^6$), ($\omega^1,\omega^5$) and ($\omega^3,\omega^7$) This property of the reverse bit order extends to higher powers: If you partition any list of length $n=2^k$ into powers of two, you always get cosets in each partition. This is useful for Fourier transforms. Let the function $\mathrm{rbo}(i,n)$ denote the $i$th number in the reverse bit order of the list of integers $0, \ldots, n-1$. ## Parameters * Payload data field elements per row $n=2^{12}$ * Payload data field elements per column $k=2^8$ * Field element modulus (BLS modulus) $p=0x73eda753299d7d483339d80809a1d80553bda402fffe5bfeffffffff00000001$ * Total data size $n \cdot k$ field elements ## Construction The payload data is arranged in a rectangle of size $n \cdot k$ field elements of $\mathbb F_p$. The data is then interpreted as evaluations of a 2-dimensional polynomial. Let $\omega$ be $2n$th roots of unity ($\omega^{2n} = 1$) and $\rho$ be $2k$th roots of unity ($\rho^{2k}=1$). Here is how we arrange the evaluations of the polynomial: | | | | | | |---|---|---|---|---| $f(\omega^{\mathrm{rbo}(0,2n)}, \rho^{\mathrm{rbo}(0,2k)})$ | $f(\omega^{\mathrm{rbo}(1,2n)}, \rho^{\mathrm{rbo}(0,2k)})$ | $f(\omega^{\mathrm{rbo}(2,2n)}, \rho^{\mathrm{rbo}(0,2k)})$ | ... | $f(\omega^{\mathrm{rbo}(n-1,2n)}, \rho^{\mathrm{rbo}(0,2k)})$ $f(\omega^{\mathrm{rbo}(0,2n)}, \rho^{\mathrm{rbo}(1,2k)})$ | $f(\omega^{\mathrm{rbo}(1,2n)}, \rho^{\mathrm{rbo}(1,2k)})$ | $f(\omega^{\mathrm{rbo}(2,2n)}, \rho^{\mathrm{rbo}(1,2k)})$ | ... | $f(\omega^{\mathrm{rbo}(n-1,2n)}, \rho^{\mathrm{rbo}(1,2k)})$ $f(\omega^{\mathrm{rbo}(0,2n)}, \rho^{\mathrm{rbo}(2,2k)})$ | $f(\omega^{\mathrm{rbo}(1,2n)}, \rho^{\mathrm{rbo}(2,2k)})$ | $f(\omega^{\mathrm{rbo}(2,2n)}, \rho^{\mathrm{rbo}(2,2k)})$ | ... | $f(\omega^{\mathrm{rbo}(n-1,2n)}, \rho^{\mathrm{rbo}(2,2k)})$ |...|...|...|...|...| $f(\omega^{\mathrm{rbo}(0,2n)}, \rho^{\mathrm{rbo}(k-1,2k)})$ | $f(\omega^{\mathrm{rbo}(1,2n)}, \rho^{\mathrm{rbo}(k-1,2k)})$ | $f(\omega^{\mathrm{rbo}(2,2n)}, \rho^{\mathrm{rbo}(k-1,2k)})$ | ... | $f(\omega^{\mathrm{rbo}(n-1,2n)}, \rho^{\mathrm{rbo}(k-1,2k)})$ This square of evaluations defines exactly one polynomial $f(X,Y)$ with maximum degree $n-1$ in $X$ and $k-1$ in $Y$. We define the Kate commitments $C_J$ for the polynomial in the $j$th row in this construction. I.e. $$ C_j = [f(s, \rho^{\mathrm{rbo}(j,2k)})]_1 $$ ## Samples Single field elements are about 32 bytes in size. For efficiency reasons, data availability sampling does not operate directly on field elements (they would be too small for efficient sampling). Rather, it operates on samples of $\ell$ field elements. The samples are defined as the $\ell$-tuples $$ S_{i,j} = (f(\omega^{\mathrm{rbo}(i\ell,2n)}, \rho^{\mathrm{rbo}(j,2k)}), f(\omega^{\mathrm{rbo}(i\ell+1,2n)}, \rho^{\mathrm{rbo}(j,2k)}), f(\omega^{\mathrm{rbo}(i\ell+2,2n)}, \rho^{\mathrm{rbo}(j,2k)}), \ldots, f(\omega^{\mathrm{rbo}((i+1)\ell-1,2n)}, \rho^{\mathrm{rbo}(j,2k)})) $$ This means each sample consists of 16 consecutive (in reverse bit order) field elements in the same row. Due to the reverse bit order construction, the samples are evaluations of $f(X, \rho^{\mathrm{rbo}(j,2k)})$ on a coset of the subgroup generated by $\psi$ for $\psi=\omega^{2n/\ell}$ We thus have a square of samples for the original data in samples: | | | | | | |---|---|---|---|---| $S_{0,0}$ | $S_{1,0}$ | $S_{2, 0}$ | ... | $S_{n/\ell-1,0}$ $S_{0,1}$ | $S_{1,1}$ | $S_{2, 1}$ | ... | $S_{n/\ell-1,1}$ $S_{0,2}$ | $S_{1,2}$ | $S_{2, 2}$ | ... | $S_{n/\ell-1,2}$ |...|...|...|...|...| $S_{0,k-1}$ | $S_{1,k-1}$ | $S_{2, k-1}$ | ... | $S_{n/\ell-1,k-1}$ ## Extension We evaluate the polynomial on a rectangle of twice the original size and get | | | | | | |---|---|---|---|---| $S_{0,0}$ | $S_{1,0}$ | $S_{2, 0}$ | ... | $S_{2n/\ell-1,0}$ $S_{0,1}$ | $S_{1,1}$ | $S_{2, 1}$ | ... | $S_{2n/\ell-1,1}$ $S_{0,2}$ | $S_{1,2}$ | $S_{2, 2}$ | ... | $S_{2n/\ell-1,2}$ |...|...|...|...|...| $S_{0,k-1}$ | $S_{1,k-1}$ | $S_{2, 2k-1}$ | ... | $S_{2n/\ell-1,2k-1}$ We also get a total of $2k$ KZG commitments $C_0, \ldots, C_{2k-1}$, one for each row. The $2k$ KZG commitments lie on a polynomial of degree $k-1$, which can be checked by the verifier using two FFTs of size $k$ deterministically, or probabilistically in linear time. ## Sample proofs When distributing the samples, each sample $S_{i, j}$ comes with a KZG evaluation proof $\pi_{i, j}$ that proves that all the $\ell$ evaluations in this sample are correct. We can now ignore the second dimension and treat everything as if there was only one row, because the sample proofs only touch a single row and can be computed individually for each row. So there is only one commitment $C=[f(s)]_1$, for the one dimensional polynomial $f(X)$ and samples $S_0, \ldots, S_{2n/\ell-1}$ with sample proofs $\pi_0, \ldots, \pi_{2n/\ell-1}$. The proof for one sample works as follows. Let $Z_{i}(X)=(X - \omega^{\mathrm{rbo}(i\ell,2n)})(X - \omega^{\mathrm{rbo}(i\ell+1,2n)})(X - \omega^{\mathrm{rbo}(i\ell+2,2n)})\cdots(X - \omega^{\mathrm{rbo}(j+1)\ell-1,2n)})$ (i.e. the zero polynomial on the evaluation points for the sample). Further let $I_i(X)$ be the interpolation polynomial that interpolates the values of $f(X)$ on with the given samples $S_{i}$, i.e. the unique polynomial of degreee at most $\ell-1$ which satisfies $$ I_i(\omega^{\mathrm{rbo}(i\ell,2n)}) = f(\omega^{\mathrm{rbo}(i\ell,2n)}) \\ I_i(\omega^{\mathrm{rbo}(i\ell+1,2n)}) = f(\omega^{\mathrm{rbo}(i\ell+1,2n)}) \\ \vdots \\ I_i(\omega^{\mathrm{rbo}((i+1)\ell-1,2n)}) = f(\omega^{\mathrm{rbo}((i+1)\ell-1,2n)}) $$ Let $q(X) = \frac{f(X) - I(X)}{Z_i(X)}$. Then the proof $\pi_i=[q(s)]_1$. It can be verified using the following pairing equation: $$ e(C - [I(s)]_1, [1]_2) = e([q(s)]_1, [Z(s)]_2) $$ # Computing all sample proofs A trick can be used to compute all proofs in one row. For details please look at the writeup: https://github.com/khovratovich/Kate/blob/master/Kate_amortized.pdf (section 3.2) The total work for this, per row, is * $n/\ell=256$ multiscalar multiplications of size $\ell=16$ * One inverse FFT of size $2n/\ell = 512$ * One forward FFT of size $2n/\ell = 512$