owned this note
owned this note
Published
Linked with GitHub
# evm object format (eof) discussion
In response to https://notes.ethereum.org/@axic/evm-object-format
(Ideas collected by chfast, gumb0, axic)
### Format overhead
There is some concern that having headers and sections adds overhead. Preliminary investigation suggests that JUMPDESTs add a measurable overhead, i.e. UniswapV2Pair has 647 jumpdests (unomptimized on 0.8.2), and 306 (optimized). Removing those negates the issue of any overhead of a header. For a simple contract the header overhead would be around 11 bytes.
### Observability
EVM bytecode is "observable" by contracts (using `CODESIZE`, `CODECOPY`, `EXTCODESIZE`, `EXTCODECOPY`, `EXTCODEHASH`). This raises a question what contracts should be able to see after EOF is deployed.
- It is preferred that legacy contracts (including ones newly created with `CREATE`) keep the current behavior. We have considered wrapping newly created legacy contracts with "EOF legacy" but it seems to have more issues than problem solved. And this can be done by Client implementations anyway (e.g. to store JUMPDEST map persistently) without standardization.
- `EXTCODEHASH` should return the hash of the whole EOF.
- `EXTCODESIZE` should return the size of the EOF. Access to this instruction may be restricted to legacy contract only. (Solidity uses this to check for existence of code, because it is cheaper to use than `EXTCODEHASH`.)
- `EXTCODECOPY` may be disallowed in EOF contract, access from legacy contract may abort execution.
- `CODESIZE` and `CODECOPY` should be disallowed in EOF contracts.
- `CODESIZE` and `CODECOPY` should be replaced with `DATASIZE`/`DATACOPY` accessing only the "data section".
### Jump/call functionality
JUMPDESTs can not be easily eliminated if we do not eliminate or restrict dynamic jumps. However simply replacing current jumps with (relative) static jumps is not enough, as that removes the capability of jumping back to the internal function caller.
A `JUMPV` (jump table / switch) opcode could help in two ways:
1. Be used in cases of switch statements. Including the ABI dispatch function, which has a considerable overhead today. However introducing for that alone is probably not a good enough reason.
2. Be used as a workaround for lack of dynamic jumps, i.e. having a large jumptable listing all caller locations in order to mimic the `RET` / jump to caller functionality.
This second use case seems like a bad reason, and suggest we do need some kind of "subroutines" or calls, or some limited version of indirect calls.
### Data contracts, code size limit, and proxies
Data contracts have been used in order to save on `SLOAD` costs. It is unclear to what extent they are utilised.
If deploying **only** EOF is allowed, that means new data contracts can't be deployed. This would mean users have to use `DATACOPY` as the next best option. This may be better or worse, depending on what our goal is: a) keep smaller code in a single account; b) load less number of accounts.
<small>(One could also consider introducing a variable length `SLOAD` opcode instead.)</small>
This also raises the question of code size limit. Code size limit serves two reasons: a) have an upper bound for jumpdest analysis; b) keep state growth limited. It is unclear however why the current 24576 is a good or bad limit.
It being a "bad" limit may be signaled by the volume of discussions about proxy designs breaking up contracts into multiple parts. It is unclear whether breaking up is beneficial or not. On one side if they are properly broken up, the more frequently used code path (such as a token transfer) may be in a single destination contract, which is loaded majority of the time. And lesser used code paths are loaded less times. The same question applies here about the goals as for data contracts.
### Deprecation
EOF brings (more) benefits if we remove deprecated/unwanted features at the same, i.e. do not allow the current complete EVM instruction set within EOF. Such could include `CALLCODE`, `SELFDESTRUCT`, some inspection functionality (`GASLIMIT` for block gas limit, etc. and those mentioned in the previous sections).
However it does not give any benefit for future deprecation. Not sure if one can more easily deprecate existing features whether we have EOF or not. At least they can be deprecated for subsequent deployments, in that sense it helps.
### Incentivization
One big concern is having "two EVMs" if we have no clear path of deprecation. If the "state expiry" proposal is adopted, that would mean old-style contracts would be penalised with larger proof costs (at the minimum due to address-translation costs).
However even without that, one could consider to penalise old-style account access, given for those jumpdest analysis must be conducted at load time (in worst case), while not for EOF.