Issues with ASE (with a translation map)

# Issues with ASE (with a translation map) Author(s): Paweł Bylica, Andrei Maiboroda, Alex Beregszaszi Useful to read before: [background explainer and specification](https://notes.ethereum.org/@ipsilon/address-space-extension-exploration) and [test/use cases](https://notes.ethereum.org/@ipsilon/address-space-extension-test-cases). --- [toc] ### 1.a) Property: EOAs can control both short and long addresses ("personas") From a single private key both short address and long address can be generated. This doesn't seem to present any issues to the core protocol, but some UX consequences we see is that a private key can't no longer be seen as representing a single user identity, and some corresponding changes in wallet software will be required. (However with the [State Expiry proposal](https://notes.ethereum.org/@vbuterin/state_expiry_eip) users can also create a separate (long) address in each address space, all from the same private key.) If long addresses also have a short form as compressed address (i.e. a hashed and truncated version of the long address), that also makes this more complicated. In that case EOAs can also have a compressed version of each long address, further increasing the number of connected addresses. Examples: ``` short: 0x7e4e7746364085626b7251dc518edb62bc415b3c long: 0x0100000000008583110d546d7e4e7746364085626b7251dc518edb62bc415b3c compressed: 0x5c46348dcf7ca17608ac51ed2e43d582fb339332 ``` ### 1.b) Property: Legacy contract can behave differently depending on optimiser settings In Solidity the `address` type is an alias to `uint160`, which means calculations, storage and calldata access is done with a mask (`2^160-1`). Unfortunately this is also depends on compilation settings. If the optimiser is turned on, the masking is removed on `address`es used as stack items (as the compiler “knows” they can not overflow and instructions ignore dirty bits). However, the masking is not removed when reading or writing storage. ### 1.c) Property: Legacy contracts not using ABIEncoderV2 Legacy contracts that use ABIEncoderV2 will validate the input and revert when passed a long address. Contracts that don't use ABIv2 might accept long address and either correctly truncate and mask it or not, depending on the case and how they were compiled. In general neither error nor success are guaranteed. ### 1.d) Question: Should all personas be accessible for EOAs Since we [established](#1a-Property-EOAs-can-control-both-short-and-long-addresses-“personas”) that for each private key there are multiple addresses (personas), the question arises whether it should be possible to send transactions with each of them. Currently transactions do not contain a sender address, rather it is derived based on the signature. The signature recovery process leads to a public key, which can be then translated into an address. After ASE it is possible to translate that public key into multiple addresses. A simple way to accomplish this is by either including a desired address or a version/epoch prefix in the encoded transaction. ### 1.e) Remark: Privacy recommendation for wallets It would be recommended for wallets not to reuse the same private keys for different epochs. Similar to HD-wallet derivation paths include chain id now, it should also include an epoch number. ### 1.f) Question: Compressed address as a persona A special subcase of 1.d) is to also allow impersonating compressed addresses. ### 1.g) Question: Token balances with compressed addresses Scenario 1: Compressed address (without translation entry) to legacy token contract. This should not be a problem, once the translation table is populated it can be retrieved. Scenario 2: Compressed address (without translation entry) to new token contract, which does a translation. This will be stuck. ### 1.h) Question: ECRECOVERed addresses vs. compressed ones A proposal from Micah is to have yet antoher translation map in the trasaction which specifies what address a legacy ecrecover call should result in. Such as a map of public key or signature to address. (Micah to clarify) ## Translation map ### 2.a) Property: Inaccessible funds case Looking at possible [use cases](https://notes.ethereum.org/@ipsilon/address-space-extension-test-cases), the translation map solution looks to be able to handle correctly all possible instructions operating with addresses, both in legacy contracts and in new contracts. However, there are cases that are arguably outside of scope of the core protocol (i.e. can be seen as an issue in the contract code or how contracts handle input), but in practice there is rather high probability that they can be broken for some users. These cases are basically the result of the fact that address can be passed to a contract in calldata, not only obtained from the `CALLER`/`ORIGIN`/`COINBASE`/etc. instructions. One example of a case where funds can be lost: calling a wallet contract transfer to a long address, but, due to a tooling or user mistake, destination address is passed as compressed address 0-padded to 32 bytes. If wallet contract does not handle translation map lookup or/and entry for this address was not added before the transfer, then balance is increased for non-existing compressed address in the state. ### 2.b) Property: Translation map has 80-bit collision resistance It is feasible (but costly) to find two long addresses that map to the same compressed address. This allows attacks similar to what [EIP-3607](https://eips.ethereum.org/EIPS/eip-3607) describes. 1. Attacker adds EOA long address `L1` to translation map with compressed address `C`. 2. Attacker deploys a contract at `L2` witch also maps to `C`. The translation map is not updated as it already contains `C`. (In case translation map is updated overwriting existing entry, the order of steps 1 and 2 can be reversed). 3. Attacker can trick users to send funds to `L2` via `C` by proving `C` is the compressed address of `L2`. The fund will actually go to `L1` because translation map holds `C => L1`. ### 2.c) Property: Translation map should behave as contract storage Originally proposed translation map is an append only map (`short_address => long_address`) kept in the Ethereum State. The translation map is populated on all account accessing instructions in `long_address` execution context. To work nicely with existing State modification rules and rollback snapshots: 1. Changes to the translation table should be reverted when transaction/call fails. 2. New entries cannot be added under `STATICCALL`. The translation map may be kept in a contract's storage at fixed `TRANSLATION_MAP_ADDR` address or kept [in the State Trie directly](https://hackmd.io/@yokPKswWQlu93Y10pQBj2Q/BJrEbKUiO). We believe selection of particular implementation does not affect this case nor many other issues presented in this document. ### 2.d) Feature: Translation map should be accessible to contracts Contents of the translation map at `TRANSLATION_MAP_ADDR` should be accessible to other contracts. If a new token contract would like to correctly handle transfer calls with compressed address in calldata (anticipating tooling or user mistake), they would need to do a translation map lookup to find the corresponding long address. Moreover, contracts may want to have additional safeguards by inspecting the translation map. However, there are important translation map limitations: - it cannot be checked whether a `long_address` is in the map, - missing `short_address` in the map means either that it is not a `compressed_address` or the `compressed_address` has not been added yet, i.e. there is no reliable way to check if an address is a compressed address. ### 2.e) Feature: Translation map should be accessible to the off-chain tools Similarly to the previous point, it would be useful for off-chain tools (such as wallets) to read the translation map. It is expected wallets would check the map to ensure a transaction will succeed, and in case of a missing translation entry, they would submit a transaction first poking the table. The map can be exposed externally via different options: 1. A new RPC method. This is added complexity on the client, but allows for any kind of underlying implementation. 2. Implementing it is a contract, so that wallets can read them. This may be added complexity in the protocol. (Some ideas can be [seen here](https://notes.ethereum.org/@ipsilon/address-space-extension-exploration#Appendix-Translation-contract).) ### 2.f) Property: Behaviour with non-existing accounts In case any long address is added to translation map on executing account access instructions (`BALANCE`, `CALL`, `EXTCODESIZE`, etc), the attack similar to [cold state access attack](https://hackmd.io/@iwck0wkoSzauVnsYI0h7JA/SkyFmk4_r) is possible: accessing random addresses in a loop would bloat the translation map, while the user will pay only for cold-reading of accounts. Simple mitigation should be possible by - a) adding an entry to translation map only if account is non-empty, or - b) charging storage write gas for any state access instruction if it updates translation map - this could be either a worst-case gas fee for any insertion, or - a penalty gas fee when inserting an ntry with an empty target account. ### 2.g) Property: Translation map size Total accounts (as of 2nd June @ etherscan): 156,000,000 Assuming the chain grows similarly, there may be another 150M new accounts in the next two years, which would be translated into short addresses. That is at a minimum **9.155 GB** of data storage (`150_000_000 * 64 / 1024**3`). The 64 bytes refers to the assumption that at least we need to store the key (compressed address) and the leaf (long address) as 32-bytes each. ## Using transaction access list ### 3.a) Fix: Transaction access list should populate the translation map Any `long_address` in a transaction access list should be added to the translation map before transaction execution starts. This provides consistency with "populating on all account accessing instructions". This allows populating the translation map explicitly without "poking" `long_addesses`. This is useful for transaction wanting to interact with `short_address` world only. Moreover, this allows end users to create a transaction which guarantees that a given set of `long_address`es will be in the translation map no matter how the execution will unfold. In this case, changes made to the translation map should not be reverted in case of transaction "fails" — similarly to transaction origin nonce always being increased. **Update:** We do not want to keep growing the translation map, therefore it would be nice to have an explicit way to list new additions. It is not a good idea to share this feature with the regular access list, as that has multiple uses, and as a result the map would be unconditionally extended. **Note:** Only include `long_address` and calculate the compressed address. ### 3.b) Idea: Translation map only populated by transaction access list Drop "populating translation map on all account accessing instructions". Instead, the translation map can only be populated by transaction access list. This limits sources of new translation map entries to one, making translation map behavior more explicit. **Note:** Concern about validity (such as DSA)? **Note:** What happens malicious omission of a compressed address? (Unclear if this is a problem.) ### 3.c) Idea: Ephemeral translation map Using transaction access list to pre-populate the translation map gives us opportunity to consider using ephemeral translation map which lives only during transaction execution and is later discarded. "Populating on account accessing instructions" feature can stay or may be dropped. Pros: 1. We save a lot of State space and I/O. 2. Simplification because revert rules may not be needed (in case of using only access list). 3. Incentivises migration to the `long_address` world. Cons: 1. Transactions may be big or hard to produce. 2. Concern about validity (just like in 3.b) **Note:** The ephemeral map should be populated via all address producing instructions and the submitted list, at the beginning of the transaction before executing any instructions. ### 3.d) Idea: Ephemeral translation map by block producers Block producers (miners) generate a map (witness) for the block. **Note:** This seemed to be posing too many issues. ### 3.e) Idea: Expiring translation map From the perspective of ASE, nothing changes. There are two options: 1) We assume that the translation map exists as a regular state object. Then we use regular state expiry rules for keeping/reviving these objects. 2) The translation map exists as a separate state object and has its own state expiry rules. **Note:** The first option is nice because it reduces the complexity of the state expiry specification, but may be suboptimal. The second option is the inverse. ## Miscellaneous ### 4.a) ECRECOVER The ecrecover precompile returns an address corresponding to a given signature. It needs to be clarified what happens to the precompile once long addresses are introduced. Questions: - Should the precompile return long vs. short adddress depending whether the caller is a legacy account or not? - Should there be a new precompile for long addresses? Return a pubkey? Return all different personas? - Should the existing precompile be extended to take a flag as an input? Furthermore, the fact that [multiple addresses can belong to the same private key](#1a-Property-EOAs-can-control-both-short-and-long-addresses) complicates this even more, as a decision need to be made whether all of them are recoverable or only some of them (and if so, which). **Update:** We concluded on Thursday that changing the current precompile is risky as it does not have any length checks, so contracts may already be sending oversized inputs. The best solution seems to be a new precompile which also supports multiple presonas -- see also [section 1.d](#1d-Question-Should-all-personas-be-accessible-for-EOAs). Some options we have discussed is returning a public key only, requiring a destination address as an input (an returning a boolean), requiring a prefix, etc. ### 4.b) EIP-3074 [EIP-3074](https://eips.ethereum.org/EIPS/eip-3074) is a new proposal relying on signature recovery and thus faces similar problems as [ecrecover](#4a-ECRECOVER). Additionally it also introduces a new execution context variable for storing the recovered address. Since it gathered significant interest from the community, we suggest to review it carefully in conjuction to the Address Space Extension proposal. ### 4.c) Coinbase in the block header Since the Yellow Paper defines addresses as 160-bits (as opposed to many fields, where it omits specifying a limit), it is likely that clients actually enforce/implement this limit. Therefore it may require work to make the header support a 32-byte coinbase.

Read more

Why EVM has JUMPDEST

EOF "Option D" Summary

EOF - Solidity library compatibility (initial) report

EOF: Ethereum’s Gateway to RISC-V Execution