Automatically merged updates to draft EIP(s) 615

Hi, I'm a bot! This change was automatically merged because:

 - It only modifies existing Draft or Last Call EIP(s)
 - The PR was approved or written by at least one author of each modified EIP
 - The build is passing
This commit is contained in:
Greg Colvin 2019-04-16 19:20:39 -04:00 committed by EIP Automerge Bot
parent 05afb41d56
commit d79403b090
1 changed files with 58 additions and 41 deletions

View File

@ -4,7 +4,7 @@ title: Subroutines and Static Jumps for the EVM
status: Draft
type: Standards Track
category: Core
author: Greg Colvin <greg@colvin.org>, Brooklyn Zelenka (@expede), Paweł Bylica (@chfast), Christian Reitwiessner (@chriseth)
author: Greg Colvin <greg@colvin.org>, Brooklyn Zelenka (@expede) , Paweł Bylica (@chfast), Christian Reitwiessner(@chriseth)
discussions-to: https://ethereum-magicians.org/t/eip-615-subroutines-and-static-jumps-for-the-evm/2728
created: 2016-12-10
---
@ -15,48 +15,54 @@ In the 21st century, on a blockchain circulating billions of ETH, formal specifi
## Abstract
EVM code is currently difficult to statically analyze, hobbling critical tools for preventing the many expensive bugs our blockchain has experienced. Further, none of the current implementations of the Ethereum Virtual Machine—including the compilers—are sufficiently performant to reduce the need for precompiles and otherwise meet the network's long-term demands. This proposal identifies dynamic jumps as a major reason for these issues, and proposes changes to the EVM specification to address the problem, making further efforts towards a safer and more performant EVM possible.
EVM code is currently difficult to statically analyze, hobbling critical tools for preventing the many expensive bugs our blockchain has experienced. Further, none of the current implementations of the Ethereum Virtual Machine—including the compilers—are sufficiently performant to reduce the need for precompiles and otherwise meet the network's long-term demands. This proposal identifies dynamic jumps as a major reason for these issues, and proposes changes to the EVM specification to address the problem, making further efforts towards a safer and more performant the EVM possible.
We also propose to validate—in linear time—that EVM contracts correctly use subroutines, avoid misuse of the stack, and meet other safety conditions _before_ placing them on the blockchain. Validated code precludes most runtime exceptions and the need to test for them. Well-behaved control flow and use of the stack makes life easier for interpreters, compilers, formal analysis, and other tools.
We also propose to validate—in linear time—that EVM contracts correctly use subroutines, avoid misuse of the stack, and meet other safety conditions _before_ placing them on the blockchain. Validated code precludes most runtime exceptions and the need to test for them. And well-behaved control flow and use of the stack makes life easier for interpreters, compilers, formal analysis, and other tools.
## Motivation
Currently the EVM supports dynamic jumps, where the address to jump to is an argument on the stack. These dynamic jumps obscure the structure of the code and thus mostly inhibit control- and data-flow analysis. This puts the quality and speed of optimized compilation fundamentally at odds. Further, since many jumps can potentially be to any jump destination in the code, the number of possible paths through the code can go up as the product of the number of jumps by the number of destinations, as does the time complexity of static analysis. Many of these cases are undecidable at deployment time, further inhibiting static and formal analyses.
Currently the EVM supports only dynamic jumps, where the address to jump to is an argument on the stack. Worse, the EVM fails to provide ordinary, alternative control flow facilities like subroutines and switches provided by Wasm and most CPUs. So dynamic jumps cannot be avoided, yet they obscure the structure of the code and thus mostly inhibit control- and data-flow analysis. This puts the quality and speed of optimized compilation fundamentally at odds. Further, since many jumps can potentially be to any jump destination in the code, the number of possible paths through the code can go up as the product of the number of jumps by the number of destinations, as does the time complexity of static analysis. Many of these cases are undecidable at deployment time, further inhibiting static and formal analyses.
Absent dynamic jumps code can be statically analyzed in linear time. Static analysis includes validation, and much of optimization, compilation, and formal analysis; every part of the tool chain benefits when linear-time analysis is available. And absent dynamic jumps, and with proper subroutines the EVM is a better target for code generation from other languages, including
However, given Ethereum's security requirements, **near-linear** **`n log n`** **time complexity** is essential. Otherwise, Contracts can be crafted or discovered with quadratic complexity to use as denial of service attack vectors against validations and optimizations.
But absent dynamic jumps code can be statically analyzed in linear time. That allows for _linear time validation_. It also allows for code generation and such optimizations as can be done in `log n` time to comprise an _`n log n`_ _time compiler_.
And absent dynamic jumps, and with proper subroutines the EVM is a better target for code generation from other languages, including
* Solidity
* Vyper
* LLVM IR
* front ends include C, C++, Common Lisp, D, Fortran, Haskell, Java, Javascript, Kotlin, Lua, Objective-C, Pony, Pure, Python, Ruby, Rust, Scala, Scheme, and Swift
The result is that all of the following validations and optimizations can be done at deployment time with **linear** **`(n)`** **or** **near-linear** **`(n log n)`** **time complexity**
The result is that all of the following validations and optimizations can be done at deployment time with near-linear `(n log n)` time complexity.
* The absence of most exceptional halting states can be validated.
* The maximum use of resources can sometimes be calculated.
* The maximum use of resources can be sometimes be calculated.
* Bytecode can be compiled to machine code.
* Compilation can optimize use of smaller registers.
* Compilation can optimize injection of gas metering.
Note that near-linear `(n log n)` time complexity is essential. Otherwise, specially crafted contracts can be used as attack vectors against any validations and optimizations.
## Specification
### Dependencies
> **[EIP-1702](https://github.com/ethereum/EIPs/pull/1702). Generalized Account Versioning Scheme.** This proposal needs a versioning scheme to allow for its bytecode (and eventually eWasm bytecode) to be deployed with existing bytecode on the same blockchain.
### Proposal
We propose to deprecate two existing instructions—`JUMP` and `JUMPI`—and propose new instructions to support their legitimate uses. In particular, it must remain possible to compile Solidity and Vyper code to EVM bytecode, with no significant loss of performance or increase in gas price.
Especially important is efficient translation to and from eWasm. To that end we maintain a close correspondence between eWasm instructions and proposed EVM instructions.
Especially important is efficient translation to and from [eWasm](https://github.com/ewasm/design) and to machine code. To that end we maintain a close correspondence between [Wasm](https://webassembly.github.io/spec/core/_download/WebAssembly.pdf), [x86](https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf), [ARM](https://static.docs.arm.com/100076/0100/arm_instruction_set_reference_guide_100076_0100_00_en.pdf) and proposed EVM instructions.
| Wasm | EIP-615 |
| -------- | -------- |
| br | JUMPTO |
| br_if | JUMPIF |
| br_table | JUMPV |
| call | JUMPSUB |
| call_indirect | JUMPSUBV |
| return | RETURN |
| local.get | GETLOCAL |
| local.put | PUTLOCAL |
| tables | DATA |
| EIP-615 | Wasm | x86 | ARM
| -------- | -------- | ----- | -----
| JUMPTO | br | JMP | B
| JUMPIF | br_if | JE | BNZ
| JUMPV | br_table | JMP | TBH
| JUMPSUB | call | CALL | BL
| JUMPSUBV | call_indirect | CALL | BLX
| RETURN | return | RET | RET
| GETLOCAL | local.get | PUSH | POP
| PUTLOCAL | local.put | PUSH | POP
| BEGINDATA | tables | .DATA | .DATA
#### Preliminaries
@ -98,13 +104,13 @@ To support subroutines, `BEGINSUB`, `JUMPSUB`, and `RETURNSUB` are provided. Br
Dynamic jumps are also used for `O(1)` indirection: an address to jump to is selected to push on the stack and be jumped to. So we also propose two more instructions to provide for constrained indirection. We support these with vectors of `JUMPDEST` or `BEGINSUB` offsets stored inline, which can be selected with an index on the stack. That constrains validation to a specified subset of all possible destinations. The danger of quadratic blow up is avoided because it takes as much space to store the jump vectors as it does to code the worst case exploit.
Dynamic jumps to a `JUMPDEST` are used to implement `O(1)` jumptables, which are useful for dense switch statements, and are implemented as instructions similar to this one on most CPUs.
Dynamic jumps to a `JUMPDEST` are used to implement `O(1)` jumptables, which are useful for dense switch statements. Wasm and most CPUs provide similar instructions.
> `JUMPV n, jump_targets`
>
> jumps to one of a vector of `n` `JUMPDEST` offsets via a zero-based index on the stack. The vector is stored inline at the `jump_targets` offset after the BEGINDATA bytecode as MSB-first, two's-complement, two-byte positive integers. If the index is greater than or equal to `n - 1` the last (default) offset is used.
Dynamic jumps to a `BEGINSUB` are used to implement `O(1)` virtual functions and callbacks, which take just two pointer dereferences on most CPUs.
Dynamic jumps to a `BEGINSUB` are used to implement `O(1)` virtual functions and callbacks, which take at most two pointer dereferences on most CPUs. Wasm provides a similar instruction.
> `JUMPSUBV n, jump_targets`
>
@ -132,6 +138,12 @@ There needs to be a way to place unreachable data into the bytecode that will be
>
> specifies that all of the following bytes to the end of the bytecode are data, and not reachable code.
#### Structure
Valid EIP-615 EVM bytecode begins with a valid header. This is the magic number \evm followed by the semantic versioning number '\1\5\0\0'. (For Wasm the header is '\0asm\1').
Following the header is the BEGINSUB opcode for the _main_ routine. It takes no arguments and returns no values. Other subroutines may follow the _main_ routine, and an optional BEGINDATA opcode may mark the start of a data section.
### Semantics
Jumps to and returns from subroutines are described here in terms of
@ -149,11 +161,11 @@ We will adopt the following conventions to describe the machine state:
* The _stack items_ between the frame pointer and the current stack pointer are called the _frame_.
* The current number of items in the frame, `FP - SP`, is the _frame size_.
Defining the frame pointer so as to include the arguments is unconventional, but better fits our stack semantics and simplifies the remainder of the proposal.
> **Note**: Defining the frame pointer so as to include the arguments is unconventional, but better fits our stack semantics and simplifies the remainder of the proposal.
The frame pointer and return stacks are internal to the subroutine mechanism, and not directly accessible to the program. This is necessary to prevent the program from modifying its own state in ways that could be invalid.
The first instruction of an array of EVM bytecode begins execution of a _main_ routine with no arguments, `SP` and `FP` set to 0, and with one value on the return stack—`code_size - 1`. (Executing the virtual byte of 0 after this offset causes an EVM to stop. Thus executing a `RETURNSUB` with no prior `JUMPSUB` or `JUMBSUBV`—that is, in the _main_ routine—executes a `STOP`.)
Execution of EVM bytecode begins with the _main_ routine with no arguments, `SP` and `FP` set to 0, and with one value on the return stack—`code_size - 1`. (Executing the virtual byte of 0 after this offset causes an EVM to stop. Thus executing a `RETURNSUB` with no prior `JUMPSUB` or `JUMBSUBV`—that is, in the _main_ routine—executes a `STOP`.)
Execution of a subroutine begins with `JUMPSUB` or `JUMPSUBV`, which
@ -278,7 +290,7 @@ All of the remaining conditions we validate statically.
#### Costs & Codes
All of the instructions are `O(1)` with a small constant, requiring just a few machine operations each, whereas a `JUMP` or `JUMPI` must do an O(log n) binary search of an array of `JUMPDEST` offsets before every jump. With the cost of `JUMPI` being _high_ and the cost of `JUMP` being _mid_, we suggest the cost of `JUMPV` and `JUMPSUBV` should be _mid_, `JUMPSUB` and `JUMPIF` should be _low_, and`JUMPTO` should be _verylow_. Measurement will tell.
All of the instructions are `O(1)` with a small constant, requiring just a few machine operations each, whereas a `JUMP` or `JUMPI` must do an O(log n) binary search of an array of `JUMPDEST` offsets before every jump. With the cost of `JUMPI` being _high_ and the cost of `JUMP` being _mid_, we suggest the cost of `JUMPV` and `JUMPSUBV` should be _mid_, `JUMPSUB` and `JUMPIF` should be _low_, and`JUMPTO` and the rest should be _verylow_. Measurement will tell.
We suggest the following opcodes:
```
@ -299,9 +311,11 @@ We suggest the following opcodes:
These changes would need to be implemented in phases at decent intervals:
>**1.** If this EIP is accepted, invalid code should be deprecated. Tools should stop generating invalid code, users should stop writing it, and clients should warn about loading it.
>**2.** A later hard fork would require clients to place only valid code on the block chain. Note that despite the fork old EVM code will still need to be supported indefinitely.
>**2.** A later hard fork would require clients to place only valid code on the block chain. Note that despite the fork old EVM code will still need to be supported indefinitely; older contracts will continue to run, and to create new contracts.
If desired, the period of deprecation can be extended indefinitely by continuing to accept code not versioned as new—but without validation. That is, by delaying phase 2. Since we must continue to run old code this is not technically difficult.
If desired, the period of deprecation can be extended indefinitely by continuing to accept code not versioned as new—but without validation. That is, by delaying or canceling phase 2.
Regardless, we will need a versioning scheme like [EIP-1702](https://github.com/ethereum/EIPs/pull/1702) to allow current code and EIP-615 code to coexist on the same blockchain.
## Rationale
@ -311,7 +325,9 @@ As described above, the approach was simply to deprecate the problematic dynamic
## Implementation
Implementation of this proposal need not be difficult. At the least, interpreters can simply be extended with the new opcodes and run unchanged otherwise. The new opcodes require only stacks for the frame pointers and return offsets and the few pushes, pops, and assignments described above. Compiled code can use native call instructions, greatly improving performance. Further optimizations include minimizing runtime checks for exceptions, condensing gas metering, and otherwise taking advantage of validated code wherever possible. A lightly tested reference implementation is available in [Greg Colvin's Aleth fork.](https://github.com/gcolvin/aleth/tree/master/libaleth-interpreter)
Implementation of this proposal need not be difficult. At the least, interpreters can simply be extended with the new opcodes and run unchanged otherwise. The new opcodes require only stacks for the frame pointers and return offsets and the few pushes, pops, and assignments described above. The bulk of the effort is the validator, which in most languages can almost be transcribed from the pseudocode above.
A lightly tested C++ reference implementation is available in [Greg Colvin's Aleth fork.](https://github.com/gcolvin/aleth/tree/master/libaleth-interpreter) This version required circa 110 lines of new interpreter code and a well-commented, 178-line validator.
## Appendix A
### Validation
@ -374,7 +390,7 @@ Note that code like this is already run by EVMs to check dynamic jumps, includin
#### Subroutine Validation
This function can be seen as a symbolic execution of a subroutine in the EVM code, where only the effect of the instructions on the state being validated is computed. Thus the structure of this function is very similar to an EVM interpreter. This function can also be seen as an acyclic traversal of the directed graph formed by taking instructions as vertexes and sequential and branching connections as edges, checking conditions along the way. The traversal is accomplished via recursion, and cycles are broken by returning when a vertex which has already been visited is reached. The time complexity of this traversal is `O(|E|+|V|)`: The sum of the number of edges and number of verticies in the graph.
This function can be seen as a symbolic execution of a subroutine in the EVM code, where only the effect of the instructions on the state being validated is computed. Thus the structure of this function is very similar to an EVM interpreter. This function can also be seen as an acyclic traversal of the directed graph formed by taking instructions as vertexes and sequential and branching connections as edges, checking conditions along the way. The traversal is accomplished via recursion, and cycles are broken by returning when a vertex which has already been visited is reached. The time complexity of this traversal is `O(|E|+|V|): The sum of the number of edges and number of verticies in the graph.
The basic approach is to call `validate_subroutine(i, 0, 0)`, for `i` equal to the first instruction in the EVM code through each `BEGINDATA` offset. `validate_subroutine()` traverses instructions sequentially, recursing when `JUMP` and `JUMPI` instructions are encountered. When a destination is reached that has been visited before it returns, thus breaking cycles. It returns true if the subroutine is valid, false otherwise.
@ -470,7 +486,7 @@ The basic approach is to call `validate_subroutine(i, 0, 0)`, for `i` equal to t
## Appendix B
### EVM Analysis
There is a large and growing ecosystem of researchers, authors, teachers, auditors, and analytic tools--providing software and services focused on the correctness and security of EVM code. A small saample is given here.
There is a large and growing ecosystem of researchers, authors, teachers, auditors, and analytic tools--providing software and services focused on the correctness and security of EVM code. A small sample is given here.
#### Some Tools
@ -498,6 +514,7 @@ There is a large and growing ecosystem of researchers, authors, teachers, audito
* [Securify: Practical Security Analysis of Smart Contracts](https://arxiv.org/pdf/1806.01143.pdf)
* [The Thunder Protocol](https://docs.thundercore.com/thunder-whitepaper.pdf)
* [Towards Verifying Ethereum Smart Contract Bytecode in Isabelle/HOL](https://ts.data61.csiro.au/publications/csiro_full_text//Amani_BBS_18.pdf)
*[A Lem formalization of EVM 1.5](https://github.com/seed/eth-isabelle/tree/evm15)
## Copyright