mirror of https://github.com/status-im/EIPs.git
Automatically merged updates to draft EIP(s) 615
Hi, I'm a bot! This change was automatically merged because: - It only modifies existing Draft or Last Call EIP(s) - The PR was approved or written by at least one author of each modified EIP - The build is passing
This commit is contained in:
parent
281e28362c
commit
c644af1925
|
@ -4,7 +4,7 @@ title: Subroutines and Static Jumps for the EVM
|
|||
status: Draft
|
||||
type: Standards Track
|
||||
category: Core
|
||||
author: Greg Colvin <greg@colvin.org>, Brooklyn Zelenka <hello@brooklynzelenka.co> , Paweł Bylica (@chfast), Christian Reitwiessner(@chriseth)
|
||||
author: Greg Colvin <greg@colvin.org>, Brooklyn Zelenka <@expede> , Paweł Bylica (@chfast), Christian Reitwiessner(@chriseth)
|
||||
discussion to: https://ethereum-magicians.org/t/eip-615-subroutines-and-static-jumps-for-the-evm/2728
|
||||
created: 2016-12-10
|
||||
---
|
||||
|
@ -21,13 +21,13 @@ We also propose to validate—in linear time—that EVM contracts correctly use
|
|||
|
||||
## Motivation
|
||||
|
||||
Currently the EVM supports dynamic jumps, where the address to jump to is an argument on the stack. These dynamic jumps obscure the structure of the code and thus mostly inhibit control- and data-flow analysis. This puts the quality and speed of optimized compilation fundamentally at odds. Further, since every jump can potentially be to any jump destination in the code, the number of possible paths through the code goes up as the product of the number of jumps by the number of destinations, as does the time complexity of static analysis. Many of these cases are undecidable at deployment time, further inhibiting static and formal analyses.
|
||||
Currently the EVM supports dynamic jumps, where the address to jump to is an argument on the stack. These dynamic jumps obscure the structure of the code and thus mostly inhibit control- and data-flow analysis. This puts the quality and speed of optimized compilation fundamentally at odds. Further, since many jumps can potentially be to any jump destination in the code, the number of possible paths through the code can go up as the product of the number of jumps by the number of destinations, as does the time complexity of static analysis. Many of these cases are undecidable at deployment time, further inhibiting static and formal analyses.
|
||||
|
||||
Absent dynamic jumps code can be statically analyzed in linear time. Static analysis includes validation, and much of optimization, compilation, and formal analysis; every part of the tool chain benefits when linear-time analysis is available. And absent dynamic jumps, and with proper subroutines the EVM is a better target for code generation from other languages, including
|
||||
* Solidity
|
||||
* Vyper
|
||||
* LLVM IR
|
||||
* front ends include C, C++, Common Lisp, D, EVM, Fortran, Haskell, Java, Javascript, Kotlin, Lua, Objective-C, Pony, Pure, Python, Ruby, Rust, Scala, Scheme, and Swift
|
||||
* front ends include C, C++, Common Lisp, D, Fortran, Haskell, Java, Javascript, Kotlin, Lua, Objective-C, Pony, Pure, Python, Ruby, Rust, Scala, Scheme, and Swift
|
||||
|
||||
The result is that all of the following validations and optimizations can be done at deployment time with **linear** **`(n)`** **or** **near-linear** **`(n log n)`** **time complexity**
|
||||
* The absence of most exceptional halting states can be validated.
|
||||
|
@ -38,7 +38,13 @@ The result is that all of the following validations and optimizations can be don
|
|||
|
||||
Note that near-linear `(n log n)` time complexity is essential. Otherwise, specially crafted contracts can be used as attack vectors against any validations and optimizations.
|
||||
|
||||
Especially important is efficient translation to and from eWasm. To that end we maintain a close correspondence between EVM and eWasm operations.
|
||||
## Specification
|
||||
|
||||
### Proposal
|
||||
|
||||
We propose to deprecate two existing instructions—`JUMP` and `JUMPI`—and propose new instructions to support their legitimate uses. In particular, it must remain possible to compile Solidity and Vyper code to EVM bytecode, with no significant loss of performance or increase in gas price.
|
||||
|
||||
Especially important is efficient translation to and from eWasm. To that end we maintain a close correspondence between eWasm instructions and proposed EVM instructions.
|
||||
|
||||
| Wasm | EIP-615 |
|
||||
| -------- | -------- |
|
||||
|
@ -50,24 +56,18 @@ Especially important is efficient translation to and from eWasm. To that end we
|
|||
| return | RETURN |
|
||||
| local.get | GETLOCAL |
|
||||
| local.put | PUTLOCAL |
|
||||
| tables | DATA |
|
||||
|
||||
## Specification
|
||||
|
||||
### Proposal
|
||||
|
||||
We propose to deprecate two existing instructions—`JUMP` and `JUMPI`—and propose new instructions to support their legitimate uses. In particular, it must remain possible to compile Solidity and Vyper code to EVM bytecode, and EVM bytecode to eWasm code, with no significant loss of performance or increase in gas price.
|
||||
| tables | DATA |
|
||||
|
||||
#### Preliminaries
|
||||
|
||||
These forms
|
||||
> `INSTRUCTION`
|
||||
> *`INSTRUCTION`*
|
||||
>
|
||||
> `INSTRUCTION x`
|
||||
> *`INSTRUCTION x`*
|
||||
>
|
||||
> `INSTRUCTION x, y`
|
||||
> *`INSTRUCTION x, y`*
|
||||
|
||||
name an `INSTRUCTION` with no, one and two arguments, respectively. An instruction is represented in the bytecode as a single-byte opcode. Any arguments are laid out as immediate data bytes following the opcode inline, interpreted as fixed length, MSB-first, two's-complement, two-byte positive integers. (Negative values are reserved for extensions.)
|
||||
name an *`INSTRUCTION`* with no, one and two arguments, respectively. An instruction is represented in the bytecode as a single-byte opcode. Any arguments are laid out as immediate data bytes following the opcode inline, interpreted as fixed length, MSB-first, two's-complement, two-byte positive integers. (Negative values are reserved for extensions.)
|
||||
|
||||
#### Branches and Subroutines
|
||||
|
||||
|
@ -94,7 +94,7 @@ To support subroutines, `BEGINSUB`, `JUMPSUB`, and `RETURNSUB` are provided. Br
|
|||
>
|
||||
>returns from the current subroutine to the instruction following the JUMPSUB that entered it.
|
||||
|
||||
#### Switches and Virtual Functions
|
||||
#### Switches, Callbacks, and Virtual Functions
|
||||
|
||||
Dynamic jumps are also used for `O(1)` indirection: an address to jump to is selected to push on the stack and be jumped to. So we also propose two more instructions to provide for constrained indirection. We support these with vectors of `JUMPDEST` or `BEGINSUB` offsets stored inline, which can be selected with an index on the stack. That constrains validation to a specified subset of all possible destinations. The danger of quadratic blow up is avoided because it takes as much space to store the jump vectors as it does to code the worst case exploit.
|
||||
|
||||
|
@ -126,7 +126,7 @@ Local variable `n` is the nth stack item below the frame pointer, `FP[-n]`, as d
|
|||
|
||||
#### Data
|
||||
|
||||
There needs to be a way to place unreachable data into the bytecode that will be skipped over and not validated. Indirect jump tables will not be valid code. Initialization code must create runtime code from data that might not be valid code. And unreachable data might prove useful to programs for other purposes.
|
||||
There needs to be a way to place unreachable data into the bytecode that will be skipped over and not validated. Indirect jump vectors will not be valid code. Initialization code must create runtime code from data that might not be valid code. And unreachable data might prove useful to programs for other purposes.
|
||||
|
||||
> `BEGINDATA`
|
||||
>
|
||||
|
@ -214,8 +214,7 @@ __________________ 22
|
|||
| 43
|
||||
frame |___________ 42
|
||||
| locals 13
|
||||
| 14
|
||||
______|___________ 15
|
||||
______|___________ 14
|
||||
<- SP
|
||||
```
|
||||
and after `RETURNSUB` would look like this
|
||||
|
@ -314,7 +313,7 @@ As described above, the approach was simply to deprecate the problematic dynamic
|
|||
|
||||
Implementation of this proposal need not be difficult. At the least, interpreters can simply be extended with the new opcodes and run unchanged otherwise. The new opcodes require only stacks for the frame pointers and return offsets and the few pushes, pops, and assignments described above. Compiled code can use native call instructions, greatly improving performance. Further optimizations include minimizing runtime checks for exceptions, condensing gas metering, and otherwise taking advantage of validated code wherever possible. A lightly tested reference implementation is available in [Greg Colvin's Aleth fork.](https://github.com/gcolvin/aleth/tree/master/libaleth-interpreter)
|
||||
|
||||
## APPENDIX
|
||||
## Appendix A
|
||||
### Validation
|
||||
|
||||
Validation comprises two tasks:
|
||||
|
@ -375,9 +374,7 @@ Note that code like this is already run by EVMs to check dynamic jumps, includin
|
|||
|
||||
#### Subroutine Validation
|
||||
|
||||
This function can be seen as a symbolic execution of a subroutine in the EVM code, where only the effect of the instructions on the state being validated is computed. Thus the structure of this function is very similar to an EVM interpreter. This function can also be seen as an acyclic traversal of the directed graph formed by taking instructions as vertexes and sequential and branching connections as edges, checking conditions along the way. The traversal is accomplished via recursion, and cycles are broken by returning when a vertex which has already been visited is reached. The time complexity of this traversal is `O(|E|+|V|)`[^subroutine_complexity].
|
||||
|
||||
[^subroutine_complexity]: The sum of the number of edges and number of verticies in the graph.
|
||||
This function can be seen as a symbolic execution of a subroutine in the EVM code, where only the effect of the instructions on the state being validated is computed. Thus the structure of this function is very similar to an EVM interpreter. This function can also be seen as an acyclic traversal of the directed graph formed by taking instructions as vertexes and sequential and branching connections as edges, checking conditions along the way. The traversal is accomplished via recursion, and cycles are broken by returning when a vertex which has already been visited is reached. The time complexity of this traversal is `O(|E|+|V|): The sum of the number of edges and number of verticies in the graph.
|
||||
|
||||
The basic approach is to call `validate_subroutine(i, 0, 0)`, for `i` equal to the first instruction in the EVM code through each `BEGINDATA` offset. `validate_subroutine()` traverses instructions sequentially, recursing when `JUMP` and `JUMPI` instructions are encountered. When a destination is reached that has been visited before it returns, thus breaking cycles. It returns true if the subroutine is valid, false otherwise.
|
||||
|
||||
|
@ -470,6 +467,38 @@ The basic approach is to call `validate_subroutine(i, 0, 0)`, for `i` equal to t
|
|||
return true
|
||||
}
|
||||
```
|
||||
## Appendix B
|
||||
### EVM Analysis
|
||||
|
||||
There is a large and growing ecosystem of researchers, authors, teachers, auditors, and analytic tools--providing software and services focused on the correctness and security of EVM code. A small saample is given here.
|
||||
|
||||
#### Some Tools
|
||||
|
||||
* [Contract Library](https://contract-library.com/)
|
||||
* [EthereumJ](https://github.com/ethereum/ethereumj)
|
||||
* [Exthereum](https://github.com/exthereum/blockchain)
|
||||
* [Harmony](https://github.com/ether-camp/ethereum-harmony)
|
||||
* [JEB](https://www.pnfsoftware.com/blog/ethereum-smart-contract-decompiler/)
|
||||
* [Mythril](https://github.com/ConsenSys/mythril)
|
||||
* [Securify](https://github.com/eth-sri/securify)
|
||||
* [Skale](https://www.skalelabs.com/)
|
||||
* [Status](https://status.im/)
|
||||
|
||||
#### Some Papers
|
||||
|
||||
* [A Formal Verification Tool for Ethereum VM Bytecode](https://www.google.com/url?q=http://fsl.cs.illinois.edu/FSL/papers/2018/park-zhang-saxena-daian-rosu-2018-fse/park-zhang-saxena-daian-rosu-2018-fse-public.pdf)
|
||||
* [A Lem formalization of EVM and some Isabelle/HOL proofs](https://github.com/pirapira/eth-isabelle)
|
||||
* [A survey of attacks on Ethereum smart contracts](https://eprint.iacr.org/2016/1007.pdf)
|
||||
* [Defining the Ethereum Virtual Machine for Interactive Theorem Provers](https://www.google.com/url?q=http://fc17.ifca.ai/wtsc/Defining%2520the%2520Ethereum%2520Virtual%2520Machine%2520for%2520Interactive%2520Theorem%2520Provers.pdf)
|
||||
* [Ethereum 2.0 Specifications](https://github.com/ethereum/eth2.0-specs)
|
||||
* [Formal Verification of Smart Contracts](https://www.cs.umd.edu/~aseem/solidetherplas.pdf)
|
||||
* [JelloPaper: Human Readable Semantics of EVM in K](https://jellopaper.org/)
|
||||
* [KEVM: A Complete Semantics of the Ethereum Virtual Machine.](https://www.ideals.illinois.edu/bitstream/handle/2142/97207/hildenbrandt-saxena-zhu-rodrigues-guth-daian-rosu-2017-tr.pdf)
|
||||
* [Making Smart Contracts Smarter](https://eprint.iacr.org/2016/633.pdf)
|
||||
* [Securify: Practical Security Analysis of Smart Contracts](https://arxiv.org/pdf/1806.01143.pdf)
|
||||
* [The Thunder Protocol](https://docs.thundercore.com/thunder-whitepaper.pdf)
|
||||
* [Towards Verifying Ethereum Smart Contract Bytecode in Isabelle/HOL](https://ts.data61.csiro.au/publications/csiro_full_text//Amani_BBS_18.pdf)
|
||||
|
||||
|
||||
## Copyright
|
||||
|
||||
|
|
Loading…
Reference in New Issue