EIP-3540: EVM Object Format (EOF) v1 (#3540)

We introduce an extensible and versioned container format for the EVM with a once-off validation at deploy time. The version described here brings the tangible benefit of code and data separation, and allows for easy introduction of a variety of changes in the future. It is introduced via two hard forks to avoid breaking any existing executable contracts.
2025-02-23 12:18:16 +00:00 · 2021-05-14 03:17:14 +01:00 · 2021-05-14 03:17:14 +01:00 · 620a98521b
commit 620a98521b
parent 65aa8fc709
1 changed files with 218 additions and 0 deletions
--- a/EIPS/eip-3540.md
+++ b/EIPS/eip-3540.md
@ -0,0 +1,218 @@
+---
+eip: 3540
+title: EVM Object Format (EOF) v1
+author: Alex Beregszaszi (@axic), Paweł Bylica (@chfast), Andrei Maiboroda (@gumb0)
+discussions-to: https://ethereum-magicians.org/t/evm-object-format-eof/5727/4
+status: Draft
+type: Standards Track
+category: Core
+created: 2021-03-16
+---
+
+## Abstract
+
+We introduce an extensible and versioned container format for the EVM with a once-off validation at deploy time. The version described here brings the tangible benefit of code and data separation, and allows for easy introduction of a variety of changes in the future. It is introduced via two hard forks to avoid breaking any existing executable contracts.
+
+## Motivation
+
+On-chain deployed EVM bytecode contains no pre-defined structure today. Code is typically validated in clients to the extent of `JUMPDEST` analysis at runtime, every single time prior to execution. This poses not only an overhead, but also a challenge for introducing new or deprecating old features. [This initial proposal](https://notes.ethereum.org/@axic/evm-object-format) explains some of these challenges.
+
+Validating code during the contract creation process allows code versioning without an additional version field in the account. Versioning is a useful tool for introducing or deprecating features, especially for larger changes (such as significant changes to control flow, or features like account abstraction).
+
+The format described in this EIP introduces a simple and extensible container with a minimal set of changes required to both clients and languages, and introduces validation.
+
+The first tangible feature it provides is separation of code and data. This separation is especially beneficial for on-chain code validators (like those utilised by layer-2 scaling tools, such as Optimism), because they can distinguish code and data (this includes deployment code and constructor arguments too). Currently they a) require changes prior to contract deployment; b) implement a fragile method; or c) implement an expensive and restrictive jump analysis. Code and data separation can result in ease of use and significant gas savings. Additionally, various (static) analysis tools can also benefit, though off-chain tools can already deal with existing code, so the impact is smaller.
+
+A non-exhaustive list of proposed changes which could benefit from this format:
+- Including a `JUMPDEST`-table (to avoid analysis at execution time) or removing `JUMPDEST`s entirely.
+- Introducing static jumps (with relative addresses) and jump tables, and disallowing dynamic jumps at the same time.
+- Requiring code section(s) to be terminated by `STOP`. (Assumptions like this can provide significant speed improvements in interpreters, such as a speed up of ~7% seen in [evmone](https://github.com/ethereum/evmone/pull/295).)
+- Multi-byte opcodes without any workarounds.
+- Representing functions as individual code sections instead of subroutines.
+- Introducing a specific section for the [EIP-2938 Account Abstraction](./eip-2938.md) "top-level AA execution frame", simplifying the proposal.
+
+## Specification
+
+*We use [RFC2119](https://tools.ietf.org/html/rfc2119) keywords in this section.*
+
+In order to guarantee that every EOF-formatted contract in the state is valid, we need to prevent already deployed (and not validated) contracts from being recognized as such format. This will be achieved by choosing a byte sequence for the *magic* that doesn't exist in any of the already deployed contracts. To prevent the growth of the search space and to limit the analysis to the contracts existing before these changes, we split changes into two hard forks, and disallow the starting byte of the format (the first byte of the magic) in the first hard fork.
+
+### First hard fork
+
+After `block.number == HF1_BLOCK` new contract creation (via create transaction, `CREATE` or `CREATE2` instructions) results in an exceptional abort if the _code_'s first byte is `0xEF`. 
+
+#### Remarks
+
+*For purely reference purposes we call the `0xEF` byte the `FORMAT`.*
+
+The *initcode* is the code executed in the context of the *create* transaction, `CREATE`, or `CREATE2` instructions. The *initcode* returns *code* (via the `RETURN` instruction), which is inserted into the account. See section 7 ("Contract Creation") in the Yellow Paper for more information.
+
+The opcode `0xEF` is currently an unimplemented instruction, therefore: *It pops no stack items and pushes no stack items, and it causes an exceptional abort when executed.* This means *initcode* or already deployed *code* starting with this instruction will continue to abort execution.
+
+### Between the two hard forks
+
+Next step to be conducted manually: Once the first hard fork went live, all existing contracts at `block.number == HF1_BLOCK` having their first byte as the `FORMAT` are examined off-chain. The goal is to find the shortest sequence after the `FORMAT` not matching any existing contract. We expect a 1- or 2-byte sequence would suffice. We call this byte sequence the *magic*, which will be used in the second hard fork.
+
+### Second hard fork
+
+In this fork we introduce _code validation_ for new contract creation. To achieve this, we define a format called EVM Object Format (EOF), containing a version indicator, and a ruleset of validity tied to a given version.
+
+At `block.number == HF2_BLOCK` new contract creation is modified:
+- if *initcode* or *code* starts with the *EOF prefix* (*TBD*), it is considered to be EOF formatted and will undergo validation specified in the following sections,
+- if *code* starts with `0xEF`, creation continues to result in an exceptional abort (the rule introduced in HF1),
+- otherwise code is considered *legacy code* and the following rules do not apply to it.
+
+Note that the *EOF prefix* here means the to be determined combination of `FORMAT` and *magic*.
+
+#### Container specification
+
+The format starts with the header:
+
+| description | length  | value | |
+|-------------|---------|-------|-|
+| format      | 1-byte  | 0xEF  | |
+| magic       | n-byte(s)  | TBD   | n >= 0 (zero in the best case) |
+| version     | 1-byte | 0x01  | means EOF1 |
+
+This is followed by one or more section headers:
+
+| description  | length  | | 
+|--------------|---------|-|
+| section_kind | 1-byte  | Encoded as a 8-bit unsigned number. |
+| section_size | 2-bytes | Encoded as a 16-bit unsigned big-endian number. |
+
+The section kinds are defined as follows:
+
+| section_kind | meaning    |
+|--------------|------------|
+| 0            | terminator |
+| 1            | code       |
+| 2            | data       |
+
+If the terminator is encountered, section size MUST NOT follow.
+
+The section contents follow after the header, in the order and size they are defined, without any padding bytes.
+
+To summarise, the bytecode has the following format:
+```
+format, magic, version, (section_kind, section_size)+, 0, <section contents>
+```
+
+#### Validation rules
+
+A bytestream starting with the *EOF prefix* declares itself conforming to the rules according to its version.
+
+1. The rules of `version=1` are specified below:
+- `section_size` MUST NOT be 0.
+- Exactly one code section MUST be present.
+- The code section MUST be the first section.
+- A single data section MAY follow the code section.
+- Stray bytes outside of sections MUST NOT be present. This includes trailing bytes after the last section.
+2. Any other version is invalid.
+
+(*Note:* Contract creation code SHOULD set the section size of the data section so that the constructor arguments fit it.)
+
+#### Changes to execution semantics
+
+For clarity, the *container* refers to the complete account code, while *code* refers to the contents of the code section only.
+
+1. Jumpdest analysis is only run on the *code*.
+2. Execution starts at the first byte of the *code*, and `PC` is set to this position within the container format (e.g. `PC=10` for a *container* with a code and data section).
+3. If `PC` goes outside of the code section bounds, execution  aborts with failure.
+4. `PC` returns the current position within the *container*.
+5. `JUMP`/`JUMPI` uses an absolute offset within the *container*.
+6. `CODECOPY`/`CODESIZE`/`EXTCODECOPY`/`EXTCODESIZE`/`EXTCODEHASH` keeps operating on the entire *container*.
+7. The input to `CREATE`/`CREATE2` is still the entire *container*.
+
+#### Changes to contract creation semantics
+
+For clarity, the *EOF prefix* together with a version number $v$ is denoted as the *EOF$v$ prefix*, e.g. *EOF1 prefix*. 
+
+1. If _initcode's container_ has EOF1 prefix it must be valid EOF1 code.
+2. If _code's container_ has EOF1 prefix it must be valid EOF1 code.
+
+## Rationale
+
+EVM and/or account versioning has been discussed numerous times over the past years. This proposal aims to learn from them. A good starting point collecting the previous proposals is [this thread](https://ethereum-magicians.org/t/ethereum-account-versioning/3508).
+
+### Execution vs. creation time validation
+
+This specification introduces creation time validation, which means:
+- All created contracts with EOF$n$ prefix are valid according to version $n$ rules. This is very strong and useful property. The client can trust that the deployed code is well-formed.
+- In future, this allows serializing `JUMPDEST` map in the EOF container and eliminate the need of implicit `JUMPDEST` analysis required before execution.
+- Or completely remove the need for `JUMPDEST` instructions.
+- This helps with deprecating EVM instructions and/or features.
+- The biggest disadvantage is that deploy-time validation of EOF code must be enabled in two hard-forks.
+
+The alternative is to have execution time validation for EOF. This is performed every single time a contract is executed, however clients may be able to cache validation results. This approach has the following properties:
+- Because the validation is consensus-level execution step, it means the execution always requires the entire code. This makes code merkleization impractical.
+- The main advantage is it can be enabled via a single hard-fork.
+- A second advantage is better backwards compatibility: data contracts starting with the `0xEF` byte or the *EOF prefix* can be deployed. This is a dubious benefit however.
+
+### Initcode vs code
+
+After HF1 and before HF2 the `FORMAT` first byte check only applies to _code_. Applying the rule also to _initcode_ would have the same effects because is not distinguishable from executing _initcode_: if it starts with `FORMAT` execution would exceptionally abort. Yet, we decided against introducing an explicit check for _initcode_ for the simplicity of the HF1 spec.
+
+### Contract creation restrictions
+
+The [Changes to contact creation semantics](#changes-to-contract-creation-semantics) section defines minimal set of restrictions related to the contract creation: if _initcode_ or _code_ has the EOF1 container prefix it must be validated. This adds two validation steps in the contract creation, any of it failing will result in contract creation failure.
+
+Because _initcode_ and _code_ is evaluated for EOF1 independently, number of interesting combinations are allowed:
+- Create transaction with EOF1 _initcode_ can deploy legacy contract,
+- EOF1 contract can execute `CREATE` instruction with legacy _initcode_ to create new legacy contract,
+- Legacy contract can execute `CREATE` instruction with EOF1 _initcode_ to create new EOF1 contract,
+- Legacy contract can execute `CREATE` instruction with EOF1 _initcode_ to create new legacy contract,
+- etc.
+
+To limit the number of exotic bytecode version combinations, additional restrictions are considered, but current not part of the specification:
+
+1. The EOF version of _initcode_ must much the version of _code_.
+2. An EOF1 contract must not create legacy contracts.
+
+Finally, create transaction must be allowed to contain legacy _initcode_ and deploy legacy _code_ because otherwise there is no transition period allowing upgrading transaction signing tools. Deprecating such transactions may be considered in future.
+
+### The FORMAT byte
+
+The `0xEF` byte was chosen because it resembles **E**xecutable **F**ormat. It has a long history of being proposed for this use case, starting with [this](https://github.com/ethereum/EIPs/issues/154).
+
+### Section structure
+
+We have considered different questions for the sections:
+- Streaming headers (i.e. `section_header, section_data, section_header, section_data, ...`) are used in some other formats (such as WebAssembly). They are handy for formats which are subject to editing (adding/removing sections). That is not a useful feature for EVM. One minor benefit applicable to our case is they do not require a specific "header terminator".
+- Whether to have a header terminator or to encode `number_of_sections` or `total_size_of_headers`. Both of the latter raise the question how large of a value these fields should be able to hold. While today there will be only two sections, in case each "EVM function" would become their section, a fixed 8-bit field may not be big enough. A terminator byte seems to avoid these problems.
+- Whether to encode `section_size` as a fixed 16-bit value or some kind of variable length field (e.g. [LEB128](https://en.wikipedia.org/wiki/LEB128)). We have opted for fixed size, because it simplifies client implementations, and 16-bit seems enough, because of the currently exposed code size limit of 24576 bytes (see [EIP-170](./eip-170.md) and [EIP-2677](./eip-2677.md)). Should this be limiting in the future, a new EOF version could change the format.
+
+### Simplified implementations
+
+Given the rigid rules of EOF1 it is possible to implement support for the container in clients using very simple pattern matching (the following assumes `magic = 0x00`):
+
+1. If code starts with `0xEF 0x00 0x01 codelen1 codelen2 0x02 datalen1 datalen2 0x00`, then calculate `total_size = (9 + (codelen1 << 8 | codelen2) + (datalen1 << 8 | datalen2))`. If `total_size == container_size` then it is a valid EOF1 code with a code and data section.
+2. If code starts with `0xEF 0x00 0x01 codelen1 codelen2 0x00`, then calculate `total_size = 7 + (codelen1 << 8 | codelen2)`. If `total_size == container_size` then it is a valid EOF1 code with a code section.
+3. Otherwise if it starts with `0xEF`, it is invalid. 
+4. Otherwise if it does not start with `0xEF`, it is valid legacy code.
+
+However, future versions may introduce more sections or loosen up restrictions, requiring clients to actually parse sections instead of pattern matching.
+
+## Backwards Compatibility
+
+This is a breaking change given new code starting with the `0xEF` byte (unless it is followed by a valid EOF header) will not be deployable, and contract creation will result in a failure. However, given bytecode is executed starting at its first byte, code already deployed with `0xEF` as the first byte is not executable anyway.
+
+While this means no currently executable contract is affected, it does rejects deployment of new data contracts starting with the `0xEF` byte.
+
+## Test Cases
+
+- evmone: https://github.com/ethereum/evmone/pull/303/files#diff-290487661e637566ddd17991f9a76ffc62da8c36fb7af7d4e4105a259b2218b7R139
+- Python validator: https://gist.github.com/axic/c7a3cbeafad0ca867b04b784c1a757a8
+
+## Reference Implementation
+
+- evmone: https://github.com/ethereum/evmone/pull/303
+- Solidity: https://github.com/ethereum/solidity/tree/eof1
+
+## Security Considerations
+
+TBA
+
+## Copyright
+
+Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).