# Control-Flow-Aligned Code Hashing ## Write-up The instruction sequence ``PUSHi X JUMP[i]``, where ``X`` is larger than the PC of the push instruction is called a *static forward jump*. We subdivide the code into largest blocks that start at a JUMPDEST (or PC 0) and end at a terminating instruction or an unconditional jump. We call these blocks *basic blocks*. We call the PC of the first instruction of a basic block the *starting PC* of the basic block. The basic block with starting PC 0 is called the *initial block*. Create a directed (acyclic but not necesarily connected) graph ``G`` where there is one node per basic block and there is an arc from basic block ``A`` to basic block ``B`` if and only if ``A`` contains a static forward jump to a PC inside ``B``. > Will it be one graph? if not, do we need to store the "initial block" for every graph in the account? [name=s] To each node in the graph we associate the hash of the following data: - the starting PC of the basic block - the hashes of the nodes this node has an arc to - the code of the node We subdivide the code of each account into chunks of a certain fixed size, add the hashes of all nodes of the aforementioned graph whose starting PC is inside the chunk and create a binary Merkle tree ``M`` of this structure. The root of this Merkle tree and the hash of the initial block is stored with each account. > Curious why fixed sized chunks in `M`? [name=s] A transaction executing code needs to supply the code of the basic blocks that were executed, their starting PC and the hashes of all nodes of ``G`` that are not executed but have an arc from a node that is executed. In order to avoid fake proofs, we need to anchor each of the basic blocks either to the basic block with starting PC 0 or to ``M``. The provided basic blocks are a sub-graph ``G'`` of ``G``. We add an additional node to ``G'`` representing ``M``. Now we try to find a (minimal) set of new edges to add from ``M`` such that each node is reachable from ``M`` or from the initial block. For each such edge, we provide a Merkle proof in ``M`` for the basic block the edge goes to. > Can you clarify the part where you add a node to `G'` to represent `M`? [name=s] Note that we need to add at most one Merkle proof for each jump in the execution that is not a static forward jump. It is not possible to provide invalid code because of the hash structure of ``G`` and ``M``. ## Message [...] we came up with an alternative that might be much cheaper but is admittedly a bit more complicated. The idea is that "merkle" proofs follow execution order instead of code offset. More details: The account also contains the root of a second hash tree that is created following all static jumps across basic blocks starting from PC 0, ignoring back edges (each tree node corresponds to such a basic block and its hash is created from its code, its starting PC and the hashes of the target blocks of static jumps). This new structure is actually a forest (it needs to be computed not just starting from PC 0, but there are multiple disconnected components) and needs to be linked from a full binary/merkle tree of the code. Proofs of execution now work as follows: the code of all executed code is included as well as the hashes (in this new structure) of all static jumps that are not taken. For each dynamic jump, we use a proof in the traditional binary/merkle tree of all code/all basic blocks. Since this tree links back into the new structure, we can go back to the more efficient proofs directly after the jump. The proof size in addition to the actually executed code should be: one hash per static jump not taken plus a full binary merkle proof per dynamic jump. Note that if we disallow dynamic jumps completely, the execution will need its own binary search for each function. Actually the generation of the hash tree in my proposal above can be simplified: instead of a complicated graph search, just hash all static jumps to a pc larger than the current and treat all other static jumps as dynamic jumps