Automatically merged updates to draft EIP(s) 615

Hi, I'm a bot! This change was automatically merged because:

 - It only modifies existing Draft or Last Call EIP(s)
 - The PR was approved or written by at least one author of each modified EIP
 - The build is passing
This commit is contained in:
Greg Colvin 2019-02-28 12:22:37 -07:00 committed by EIP Automerge Bot
parent 281e28362c
commit c644af1925
1 changed files with 52 additions and 23 deletions

View File

@ -4,7 +4,7 @@ title: Subroutines and Static Jumps for the EVM
status: Draft status: Draft
type: Standards Track type: Standards Track
category: Core category: Core
author: Greg Colvin <greg@colvin.org>, Brooklyn Zelenka <hello@brooklynzelenka.co> , Paweł Bylica (@chfast), Christian Reitwiessner(@chriseth) author: Greg Colvin <greg@colvin.org>, Brooklyn Zelenka <@expede> , Paweł Bylica (@chfast), Christian Reitwiessner(@chriseth)
discussion to: https://ethereum-magicians.org/t/eip-615-subroutines-and-static-jumps-for-the-evm/2728 discussion to: https://ethereum-magicians.org/t/eip-615-subroutines-and-static-jumps-for-the-evm/2728
created: 2016-12-10 created: 2016-12-10
--- ---
@ -21,13 +21,13 @@ We also propose to validate—in linear time—that EVM contracts correctly use
## Motivation ## Motivation
Currently the EVM supports dynamic jumps, where the address to jump to is an argument on the stack. These dynamic jumps obscure the structure of the code and thus mostly inhibit control- and data-flow analysis. This puts the quality and speed of optimized compilation fundamentally at odds. Further, since every jump can potentially be to any jump destination in the code, the number of possible paths through the code goes up as the product of the number of jumps by the number of destinations, as does the time complexity of static analysis. Many of these cases are undecidable at deployment time, further inhibiting static and formal analyses. Currently the EVM supports dynamic jumps, where the address to jump to is an argument on the stack. These dynamic jumps obscure the structure of the code and thus mostly inhibit control- and data-flow analysis. This puts the quality and speed of optimized compilation fundamentally at odds. Further, since many jumps can potentially be to any jump destination in the code, the number of possible paths through the code can go up as the product of the number of jumps by the number of destinations, as does the time complexity of static analysis. Many of these cases are undecidable at deployment time, further inhibiting static and formal analyses.
Absent dynamic jumps code can be statically analyzed in linear time. Static analysis includes validation, and much of optimization, compilation, and formal analysis; every part of the tool chain benefits when linear-time analysis is available. And absent dynamic jumps, and with proper subroutines the EVM is a better target for code generation from other languages, including Absent dynamic jumps code can be statically analyzed in linear time. Static analysis includes validation, and much of optimization, compilation, and formal analysis; every part of the tool chain benefits when linear-time analysis is available. And absent dynamic jumps, and with proper subroutines the EVM is a better target for code generation from other languages, including
* Solidity * Solidity
* Vyper * Vyper
* LLVM IR * LLVM IR
* front ends include C, C++, Common Lisp, D, EVM, Fortran, Haskell, Java, Javascript, Kotlin, Lua, Objective-C, Pony, Pure, Python, Ruby, Rust, Scala, Scheme, and Swift * front ends include C, C++, Common Lisp, D, Fortran, Haskell, Java, Javascript, Kotlin, Lua, Objective-C, Pony, Pure, Python, Ruby, Rust, Scala, Scheme, and Swift
The result is that all of the following validations and optimizations can be done at deployment time with **linear** **`(n)`** **or** **near-linear** **`(n log n)`** **time complexity** The result is that all of the following validations and optimizations can be done at deployment time with **linear** **`(n)`** **or** **near-linear** **`(n log n)`** **time complexity**
* The absence of most exceptional halting states can be validated. * The absence of most exceptional halting states can be validated.
@ -38,7 +38,13 @@ The result is that all of the following validations and optimizations can be don
Note that near-linear `(n log n)` time complexity is essential. Otherwise, specially crafted contracts can be used as attack vectors against any validations and optimizations. Note that near-linear `(n log n)` time complexity is essential. Otherwise, specially crafted contracts can be used as attack vectors against any validations and optimizations.
Especially important is efficient translation to and from eWasm. To that end we maintain a close correspondence between EVM and eWasm operations. ## Specification
### Proposal
We propose to deprecate two existing instructions—`JUMP` and `JUMPI`—and propose new instructions to support their legitimate uses. In particular, it must remain possible to compile Solidity and Vyper code to EVM bytecode, with no significant loss of performance or increase in gas price.
Especially important is efficient translation to and from eWasm. To that end we maintain a close correspondence between eWasm instructions and proposed EVM instructions.
| Wasm | EIP-615 | | Wasm | EIP-615 |
| -------- | -------- | | -------- | -------- |
@ -50,24 +56,18 @@ Especially important is efficient translation to and from eWasm. To that end we
| return | RETURN | | return | RETURN |
| local.get | GETLOCAL | | local.get | GETLOCAL |
| local.put | PUTLOCAL | | local.put | PUTLOCAL |
| tables | DATA | | tables | DATA |
## Specification
### Proposal
We propose to deprecate two existing instructions—`JUMP` and `JUMPI`—and propose new instructions to support their legitimate uses. In particular, it must remain possible to compile Solidity and Vyper code to EVM bytecode, and EVM bytecode to eWasm code, with no significant loss of performance or increase in gas price.
#### Preliminaries #### Preliminaries
These forms These forms
> `INSTRUCTION` > *`INSTRUCTION`*
> >
> `INSTRUCTION x` > *`INSTRUCTION x`*
> >
> `INSTRUCTION x, y` > *`INSTRUCTION x, y`*
name an `INSTRUCTION` with no, one and two arguments, respectively. An instruction is represented in the bytecode as a single-byte opcode. Any arguments are laid out as immediate data bytes following the opcode inline, interpreted as fixed length, MSB-first, two's-complement, two-byte positive integers. (Negative values are reserved for extensions.) name an *`INSTRUCTION`* with no, one and two arguments, respectively. An instruction is represented in the bytecode as a single-byte opcode. Any arguments are laid out as immediate data bytes following the opcode inline, interpreted as fixed length, MSB-first, two's-complement, two-byte positive integers. (Negative values are reserved for extensions.)
#### Branches and Subroutines #### Branches and Subroutines
@ -94,7 +94,7 @@ To support subroutines, `BEGINSUB`, `JUMPSUB`, and `RETURNSUB` are provided. Br
> >
>returns from the current subroutine to the instruction following the JUMPSUB that entered it. >returns from the current subroutine to the instruction following the JUMPSUB that entered it.
#### Switches and Virtual Functions #### Switches, Callbacks, and Virtual Functions
Dynamic jumps are also used for `O(1)` indirection: an address to jump to is selected to push on the stack and be jumped to. So we also propose two more instructions to provide for constrained indirection. We support these with vectors of `JUMPDEST` or `BEGINSUB` offsets stored inline, which can be selected with an index on the stack. That constrains validation to a specified subset of all possible destinations. The danger of quadratic blow up is avoided because it takes as much space to store the jump vectors as it does to code the worst case exploit. Dynamic jumps are also used for `O(1)` indirection: an address to jump to is selected to push on the stack and be jumped to. So we also propose two more instructions to provide for constrained indirection. We support these with vectors of `JUMPDEST` or `BEGINSUB` offsets stored inline, which can be selected with an index on the stack. That constrains validation to a specified subset of all possible destinations. The danger of quadratic blow up is avoided because it takes as much space to store the jump vectors as it does to code the worst case exploit.
@ -126,7 +126,7 @@ Local variable `n` is the nth stack item below the frame pointer, `FP[-n]`, as d
#### Data #### Data
There needs to be a way to place unreachable data into the bytecode that will be skipped over and not validated. Indirect jump tables will not be valid code. Initialization code must create runtime code from data that might not be valid code. And unreachable data might prove useful to programs for other purposes. There needs to be a way to place unreachable data into the bytecode that will be skipped over and not validated. Indirect jump vectors will not be valid code. Initialization code must create runtime code from data that might not be valid code. And unreachable data might prove useful to programs for other purposes.
> `BEGINDATA` > `BEGINDATA`
> >
@ -214,8 +214,7 @@ __________________ 22
| 43 | 43
frame |___________ 42 frame |___________ 42
| locals 13 | locals 13
| 14 ______|___________ 14
______|___________ 15
<- SP <- SP
``` ```
and after `RETURNSUB` would look like this and after `RETURNSUB` would look like this
@ -314,7 +313,7 @@ As described above, the approach was simply to deprecate the problematic dynamic
Implementation of this proposal need not be difficult. At the least, interpreters can simply be extended with the new opcodes and run unchanged otherwise. The new opcodes require only stacks for the frame pointers and return offsets and the few pushes, pops, and assignments described above. Compiled code can use native call instructions, greatly improving performance. Further optimizations include minimizing runtime checks for exceptions, condensing gas metering, and otherwise taking advantage of validated code wherever possible. A lightly tested reference implementation is available in [Greg Colvin's Aleth fork.](https://github.com/gcolvin/aleth/tree/master/libaleth-interpreter) Implementation of this proposal need not be difficult. At the least, interpreters can simply be extended with the new opcodes and run unchanged otherwise. The new opcodes require only stacks for the frame pointers and return offsets and the few pushes, pops, and assignments described above. Compiled code can use native call instructions, greatly improving performance. Further optimizations include minimizing runtime checks for exceptions, condensing gas metering, and otherwise taking advantage of validated code wherever possible. A lightly tested reference implementation is available in [Greg Colvin's Aleth fork.](https://github.com/gcolvin/aleth/tree/master/libaleth-interpreter)
## APPENDIX ## Appendix A
### Validation ### Validation
Validation comprises two tasks: Validation comprises two tasks:
@ -375,9 +374,7 @@ Note that code like this is already run by EVMs to check dynamic jumps, includin
#### Subroutine Validation #### Subroutine Validation
This function can be seen as a symbolic execution of a subroutine in the EVM code, where only the effect of the instructions on the state being validated is computed. Thus the structure of this function is very similar to an EVM interpreter. This function can also be seen as an acyclic traversal of the directed graph formed by taking instructions as vertexes and sequential and branching connections as edges, checking conditions along the way. The traversal is accomplished via recursion, and cycles are broken by returning when a vertex which has already been visited is reached. The time complexity of this traversal is `O(|E|+|V|)`[^subroutine_complexity]. This function can be seen as a symbolic execution of a subroutine in the EVM code, where only the effect of the instructions on the state being validated is computed. Thus the structure of this function is very similar to an EVM interpreter. This function can also be seen as an acyclic traversal of the directed graph formed by taking instructions as vertexes and sequential and branching connections as edges, checking conditions along the way. The traversal is accomplished via recursion, and cycles are broken by returning when a vertex which has already been visited is reached. The time complexity of this traversal is `O(|E|+|V|): The sum of the number of edges and number of verticies in the graph.
[^subroutine_complexity]: The sum of the number of edges and number of verticies in the graph.
The basic approach is to call `validate_subroutine(i, 0, 0)`, for `i` equal to the first instruction in the EVM code through each `BEGINDATA` offset. `validate_subroutine()` traverses instructions sequentially, recursing when `JUMP` and `JUMPI` instructions are encountered. When a destination is reached that has been visited before it returns, thus breaking cycles. It returns true if the subroutine is valid, false otherwise. The basic approach is to call `validate_subroutine(i, 0, 0)`, for `i` equal to the first instruction in the EVM code through each `BEGINDATA` offset. `validate_subroutine()` traverses instructions sequentially, recursing when `JUMP` and `JUMPI` instructions are encountered. When a destination is reached that has been visited before it returns, thus breaking cycles. It returns true if the subroutine is valid, false otherwise.
@ -470,6 +467,38 @@ The basic approach is to call `validate_subroutine(i, 0, 0)`, for `i` equal to t
return true return true
} }
``` ```
## Appendix B
### EVM Analysis
There is a large and growing ecosystem of researchers, authors, teachers, auditors, and analytic tools--providing software and services focused on the correctness and security of EVM code. A small saample is given here.
#### Some Tools
* [Contract Library](https://contract-library.com/)
* [EthereumJ](https://github.com/ethereum/ethereumj)
* [Exthereum](https://github.com/exthereum/blockchain)
* [Harmony](https://github.com/ether-camp/ethereum-harmony)
* [JEB](https://www.pnfsoftware.com/blog/ethereum-smart-contract-decompiler/)
* [Mythril](https://github.com/ConsenSys/mythril)
* [Securify](https://github.com/eth-sri/securify)
* [Skale](https://www.skalelabs.com/)
* [Status](https://status.im/)
#### Some Papers
* [A Formal Verification Tool for Ethereum VM Bytecode](https://www.google.com/url?q=http://fsl.cs.illinois.edu/FSL/papers/2018/park-zhang-saxena-daian-rosu-2018-fse/park-zhang-saxena-daian-rosu-2018-fse-public.pdf)
* [A Lem formalization of EVM and some Isabelle/HOL proofs](https://github.com/pirapira/eth-isabelle)
* [A survey of attacks on Ethereum smart contracts](https://eprint.iacr.org/2016/1007.pdf)
* [Defining the Ethereum Virtual Machine for Interactive Theorem Provers](https://www.google.com/url?q=http://fc17.ifca.ai/wtsc/Defining%2520the%2520Ethereum%2520Virtual%2520Machine%2520for%2520Interactive%2520Theorem%2520Provers.pdf)
* [Ethereum 2.0 Specifications](https://github.com/ethereum/eth2.0-specs)
* [Formal Verification of Smart Contracts](https://www.cs.umd.edu/~aseem/solidetherplas.pdf)
* [JelloPaper: Human Readable Semantics of EVM in K](https://jellopaper.org/)
* [KEVM: A Complete Semantics of the Ethereum Virtual Machine.](https://www.ideals.illinois.edu/bitstream/handle/2142/97207/hildenbrandt-saxena-zhu-rodrigues-guth-daian-rosu-2017-tr.pdf)
* [Making Smart Contracts Smarter](https://eprint.iacr.org/2016/633.pdf)
* [Securify: Practical Security Analysis of Smart Contracts](https://arxiv.org/pdf/1806.01143.pdf)
* [The Thunder Protocol](https://docs.thundercore.com/thunder-whitepaper.pdf)
* [Towards Verifying Ethereum Smart Contract Bytecode in Isabelle/HOL](https://ts.data61.csiro.au/publications/csiro_full_text//Amani_BBS_18.pdf)
## Copyright ## Copyright