# ASE (Address Space Extension) with Translation Map **Work in progress ideas, based on https://ethereum-magicians.org/t/increasing-address-size-from-20-to-32-bytes/5485.** [toc] ## Introduction Read the forum first. Challenges: 1. Currently existing contracts rely on 20-byte addresses 2. External protocols in many cases rely on 20-byte addresses 3. How to support both 20- and 32-byte addresses in existing contracts? 4. How to support both 20- and 32-byte addresses in new contracts? ### Problems with Solidity Problem 1: In Solidity, `address` is an alias to `uint160`, which means calculations, storage and calldata access is done with a mask (`2^160-1`). Problem 2: One could say that is nice -- at least it is consistent. However it is not. If the optimiser is turned on, the masking is removed on addresses as stack items (as the compiler "knows" they can not overflow and instructions ignore dirty bits). It is not removed however from storage. Problem 3: Irrevelant to optimiser or internal layout, the ABIEncoderV2 does input validation, and if the supplied `address` is longer, that results in a revert. ## Specification ### Notation - `hex"0100"` represents a hex string resulting in the bytes `01`, `00`. - `||` represents bytewise concatenation - `[i:]` represents bytewise slicing, where the first `i` bytes are truncated - `keccak256(in) -> out` represents the Keccak-256 hashing function - `rlp([...]) -> out` represents the RLP encoding of the array `[...]` ### Address format Current account addresses (both contract and EOA) are 20-bytes long, usually derived as the least significant 20-bytes of the Keccak-256 hash of the public key or creator address. We call this the `shortaddress`. We introduce a new address schema, which is 32-bytes long: ``` byte 0 ; Version byte (must be 1) byte 1-5 : Reserved (must be zero) byte 6-31 ; 26-byte Keccak-256 hash ``` We call this the `longaddress`. Notice that due to the version byte, a `longaddress` is always larger (in terms of bits) than a `shortaddress`. *Question: should we allow the current scheme to be named `version 0` (i.e. version byte 0 is, and bytes 1-12 are zero)?* *Question: if so, should we just call these `version 0 address` and `version 1 address`?* ### External addresses (EOAs) For Externally Owned Addresses (e.g. non-contract addresses) a `longaddress` is calculated using `hex"010000000000" || keccak256(pubkey)[6:]`. It is possible to control both a `shortaddress` and a `longaddress` for the same EOA public-private key pair. ### Transactions **If the `to` field of EIP-1559 is unbounded (i.e. allows 256-bit inputs), then this feature is not strictly needed. Even in that case, this feature allows creation of new-style contracts, though those can be accomplished via on-chain proxies as well. The unbounded nature of the transaction format is likely true, given it is RLP, but EIP-1559 is still in flux.** We introduce a new transaction type based on [EIP-1559](https://eips.ethereum.org/EIPS/eip-1559): ``` hex"03" || rlp([chainId, nonce, maxPriorityFeePerGas, maxFeePerGas, gasLimit, to, value, data, accessList, signatureYParity, signatureR, signatureS]) ``` This transaction type behaves exactly like EIP-1559, but uses a new type (`0x03`) and allows the `to` field to contain a 32-byte address. In this new transaction type, contract creation transactions are modified to return a `longaddress`. **TODO: extend the access list definition to support 32-byte addresses. It is RLP as well, so the same question applies to it.** ### Execution Finally we define how execution semantics are affected. For clarity, accounts which have a `shortaddress` are referred to as `legacy account`, and those with `longaddress` as `extended account`. For the EVM execution frame a new context variable `isLegacyAccount` is introduced. It set to `true` if the code currently executed belongs to a`legacy account`, otherwise to `false`. #### Address Translation Since legacy accounts can only handle `shortaddresses`, we introduce a translation step from `longaddresses`. First we define some helpers: ```python TRANSLATION_MAP_ADDR = hex"00000000000000000000000000000000000000ff" EMPTY_CHUNK = bytes32([0] * 32) DOMAIN = hex"efefefef" # First 12 bytes are not empty, consider it as a LongAddress def is_longaddress(address: LongAddress) -> bool: return address[:12] != hex"0000000000000000000000" # This helper compresses a LongAddress def compress(address: LongAddress) -> ShortAddress: return keccak256(keccak256(DOMAIN + address))[12:] # Insert entry into the translation map if the address is a LongAddress # # This is only called within a non-legacy context. def compress_and_touch(state: EthereumState, address: LongAddress) -> ShortAddress: if is_longaddress(address): short_address = compress(address) state[TRANSLATION_MAP_ADDR][short_address] = address return short_address else return address # Look up a target address (if a ShortAddress translates to a LongAddress) # # This is only called within a legacy context. def lookup_target(state: EthereumState, address: LongAddress) -> LongAddress: # Truncate input address = address[12:] if state[TRANSLATION_MAP_ADDR][address] != EMPTY_CHUNK: return state[TRANSLATION_MAP_ADDR][address] else # The account may or may not be legacy return address ``` Note that if a call frame reverts, the potential changes made by `compress_and_touch` must also be reverted. *Question:* Should it be allowed for `STATICCALL` to modify the translation table? #### Within Extended Accounts If `isLegacyAccount` is false: - `ADDRESS`, `ORIGIN`, `CALLER`, `COINBASE` returns a `longaddress` (note that a `shortaddress` is also valid as a `longaddress`) - `BALANCE`, `EXTCODESIZE`, `EXTCODECOPY`, `EXTCODEHASH`, `CALL`, `CALLCODE`, `DELEGATECALL`, `STATICCALL`, and `SELFDESTRUCT` takes a `longaddress` - for `CREATE` the final hashing is modified to `hex"010000000000" || keccak256(rlp(address, nonce))[6:]`, i.e. creates a `longaddress` - for `CREATE2` the final hashing is modified to `hex"010000000000" || keccak256(hex"ff" || address || salt || keccak256(init_code))[6:]`, i.e. creates a `longaddress` The changes for `CREATE` and `CREATE2` mean that extended accounts can only create extended accounts, and not legacy accounts. The instructions `BALANCE`, `EXTCODESIZE`, `EXTCODECOPY`, `EXTCODEHASH`, `CALL`, `CALLCODE`, `DELEGATECALL`, `STATICCALL`, or `SELFDESTRUCT` are modified to first call `compress_and_touch(target)` after charging the base gas cost. #### Within Legacy Accounts If `isLegacyAccount` is true: - `ADDRESS`, `CREATE`, `CREATE2` is unchaged - `ORIGIN`, `CALLER`, `COINBASE` returns the address via `compress_and_touch(addr)` - `BALANCE`, `EXTCODESIZE`, `EXTCODECOPY`, `EXTCODEHASH`, `CALL`, `CALLCODE`, `DELEGATECALL`, `STATICCALL`, and `SELFDESTRUCT` first calls `lookup_target(addr)` to obtain the potential target address ### The ECRECOVER precompile **TODO: Explain changes needed here.** *Question: Should the precompile return long vs. short adddress depending whether the caller is a legacy account or not?* *Question: Should there be a new precompile for long addresses?* *Question: Should the existing precompile be extended to take a flag as an input?* ## Appendix: CREATE3 We introduce a new opcode, `CREATE3` (`0xf6`), which behaves exactly like `CREATE2` (`0xf5`), but the final hash is truncated differently. Replace the hashing definition in [EIP-1014](https://eips.ethereum.org/EIPS/eip-1014) with the following: `hex"010000000000" || keccak256( hex"ff" || address || salt || keccak256(init_code))[6:]`. **Alternate option is that both `CREATE` and `CREATE2` return `longaddress`es if run in the context of a non-legacy account.** ## Appendix: Translation contract The description above accesses and adjusts state directly (i.e. `state[TRANSLATION_MAP_ADDR][address]`). This is fine for internal lookups. One question is unanswered: does it make sense to expose this information to the outside, and if so, how would it be queryable? It would be nice for wallets to know whether something is in the translation table yet, before conducting a transaction, because if it is missing they may need to poke the table from the longaddress side. Similarly to [EIP-210](https://eips.ethereum.org/EIPS/eip-210) it is possible to introduce a contract instead, which could given an answer to this. Suppose the translation table is a contract, with two externally visible functions (these are [Contract ABI](https://docs.soliditylang.org/en/v0.8.5/abi-spec.html) signatures): - `set(bytes20,bytes32)` — sets a translation - `get(bytes20)` — returns a long address The `set` function would need to be restricted to only allow the “system” to be the caller. If there is no contract, then the nice way to query the table would be a specific RPC method, such as `eth_queryAddressTranslation(shortAddress) -> longAddress`. ## Appendix: Merge account In case a legacy account uses a short version of a long address before it is placed into the transalation map (by means of someone supplying it via calldata to them), it will create a new entry in the state for that short address. Suppose then the translation table is populated through another transaction. After this, legacy accounts can not access the previously created account record, only the translated one (see `lookup_target`). A potential workaround is to "merge" such accounts upon touching. This feels like a way to increase the protocol complexity to much, and should be solved on the UX level. **We think the risk for user error is elevated though.** Define the following helper: ```python # Merge in case an account exists without code, but value, and it is also present in the translation map. # # This happens if a legacy contract touched an extended account prior to translation. def merge_account(state: EthereumState, short_address: ShortAddress, long_address: LongAddress): assert(state[TRANSLATION_MAP_ADDR][short_address] == long_address) # Non-translated account account = state[short_address] if account.nonce == 0 and account.code.empty() and account.balance != 0: state[long_address].balance += account.balance delete state[short_address] ``` Replace `compress_and_touch` with: ```python # Insert entry into the translation map if the address is a LongAddress # # This is only called within a non-legacy context. def compress_and_touch(state: EthereumState, address: LongAddress) -> ShortAddress: if is_longaddress(address): short_address = compress(address) state[TRANSLATION_MAP_ADDR][short_address] = address merge_account(state, short_address, address) return short_address else return address ``` ## Differences to the version in the forum 1. `merge_account` is a new concept 2. The translation map is populated on all account accessing instructions, while in the forum it only happens when a legacy account uses `CALLER` and the caller is a `longaddress` -- the greedy version in this document is likey "correct" in covering all potential cases, but it may be too greedy, and the `CALLER` case may be enough 3. The compression uses double hashing 4. `CREATE3` is not introduced 5. `BIGCALLER` is not introduced (a way to query the `longaddress` even in a legacy context) -- Is this needed/useful? ## Questions / TODOs 1. Is validation of `longaddress` needed anywhere? (i.e. it has version 1, etc.) 2. Does the compression should take a single or double hash? 3. Should CREATE/CREATE2 change depending on context? (as in the above spec) - We proposed to change behavior of existing instructions so let's stick with it. 4. Should CREATE3 be introduced so that legacy contracts can create new addresses? - No. 5. In legacy contracts, should address inputs to opcodes be truncated? Mimicing the current behaviour, as those bits are ignored. - Yes. 6. Is `BIGCALLER`/`REALCALLER` needed? - No. - The use case is the ability to check whether the caller is legacy or not (ie. `CALLER != REALCALLER`). Could it be (mis)used for some other use case? 7. Insert into map on first access to longaddress (i.e. any transaction)? 8. Assume that it is hard to grind a collison between short and long? Is double hashing needed? 9. What happens if no translation existed so a short account entry was created, but next time translation exists -- is the short account unaccessible or "migrated" during next translation? 10. Can the map be prepopulated with existing accounts? 11. Instead of a map, have new account entries working as symlink? i.e. the account entry would not contain nonce/balance/code/storage, but a single field for the destination longaddress > Separate accounts are more of a state bloat I think [name=Andrei] 12. Should access list entries also trigger translation? 13. Should the translation table only be populated by the access list entries? 14. Instead of a translation table, should we only rely on a translation in the "access list"? ### Size of translation map (concern) Total accounts (as of 2nd June @ etherscan): 156,000,000 Assume the chain grows similarly, there may be another 150M new accounts in the next two years, which would be translated into short addresses. That is at a minimum `150_000_000 * 64 / 1024 / 1024 = 9155` Mb of data storage. ## Test cases **See [this comprehensive list](https://notes.ethereum.org/@ipsilon/rk8C21p9_).**