Update eip-2315.md (#4219)

This commit is contained in:
Greg Colvin 2021-09-25 02:30:24 -04:00 committed by GitHub
parent 737997c231
commit 8639f48cdc
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -12,9 +12,11 @@ requires: 3540, 3670
## Abstract
This proposal introduces four opcodes to support simple subroutines and relative jumps: `JUMPSUB`, `RETURNSUB`, `RJUMP` and `RJUMPI`.
This proposal introduces five opcodes to support simple subroutines and relative jumps: `JUMPSUB`, `RETURNSUB`, `RJUMP`, `RJUMPI`, and `RJUMPV`.
These provide a complete static control-flow facility that supports substantial reductions in the complexity and the gas costs of calling and optimizing simple subroutines from %33 to as much as 52% savings in gas.
These provide a safe, complete, static control-flow facility that supports substantial reductions in the complexity and the gas costs of calling and optimizing simple subroutines from %33 to as much as 52% savings in gas.
Valid contracts will not halt with an exception unless they run out of gas or overflow stack while making a recursive subroutine call.
## Motivation
@ -206,67 +208,83 @@ To provide a complete set of control structures, and to take full advantage of t
#### `RJUMP (0x5c) offset`
> Transfers control to the address `PC + offset`, where offset is a three-byte, MSB first, twos-complement integer.
> Transfers control to the address `PC + offset`, where `offset` is a two-byte, MSB first, twos-complement integer.
>
> 1. Decode the `offset` from the immediate data. The data is encoded as three bytes, MSB first, twos-complement.
> 1. Decode the `offset` from the immediate data. The data is encoded as a two-byte, MSB first, twos-complement integer.
> 2. Set `PC` to `location`.
>
> The cost is _low_.
#### `RJUMPI (0x5d) offset`
> Conditionally transfers control to the address `PC + offset`, where offset is a three-byte, MSB first, twos-complement integer.
> 1. Decode the `offset` from the immediate data. The data is encoded as three bytes, MSB first, twos-complement.
> Conditionally transfers control to the address `PC + offset`, where offset is a two byte, MSB first, twos-complement integer.
> 1. Decode the `offset` from the immediate data. The data is encoded as a two-byte, MSB first, twos-complement integer.
> 2. Pop the `condition` from the stack.
> 3. If the condition is true then continue
> 4. Set `PC` to `PC + offset`.
>
> The cost is _mid_.
#### `RJUMPV (0x5e) n offset ...`
> Transfers control to the address at `SP[0] + PC + 2 + offset`; or else to the default address, where `n` and `offset` are two-byte, MSB first, twos-complement integers.
> 2. Pop `n` from the stack.
> 1. Decode the `count` from the immediate data. The data is encoded as two-byte, MSB first, twos-complement.
> 4. `if (n < count) PC = PC[2 + 2*n] else PC = PC[2 + 2*count]`.
>
> The cost is `mid + n*low`.
_Notes:_
* _If a resulting `PC` to be executed is beyond the last instruction then the opcode is implicitly a `STOP`, which is not an error._
* _Values popped off the `return stack` do not need to be validated, since they are alterable only by `JUMPSUB` and `RETURNSUB`._
* _The description above lays out the semantics of this feature in terms of a `return stack`. But the actual state of the `return stack` is not observable by EVM code or consensus-critical to the protocol. (For example, a node implementer may code `JUMPSUB` to unobservably push `PC` on the `return stack` rather than `PC + 1`, which is allowed so long as `RETURNSUB` observably returns control to the `PC + 1` location.)_
* _The `return stack` is the functional equivalent of Turing's "delay line"._
`JUMP` and `JUMPI` are assigned _mid_ and _high_ gas fees -- they require operations on 256-bit stack items and checking for valid destinations. None of these operations require checking, and only `RJUMPI` requires 256-bit arithmetic. So the _low_ cost of `JUMPSUB` versus is justified by needing only about six Go operations to push the return address on the return stack and decode the immediate two byte destination to the `PC` and the _verylow_ cost of `RETURNSUB` is justified by needing only about three Go operations to pop the return stack into the `PC`. Similarly, the _low_ cost of `RJUMP` is justified by needing even less work than `JUMPSUB`, and the cost `RJUMPI` is `mid` because of the extra work to test the conditional. Benchmarking will be needed to tell if the costs are well-balanced.
`JUMP` and `JUMPI` are assigned _mid_ and _high_ gas fees, and they require operations on 256-bit stack items and checking for valid destinations Whereas none of these operations require checking, and only `RJUMPI` requires 256-bit arithmetic. The _low_ cost of `JUMPSUB` versus is justified by needing only to push the return address on the return stack and decode the immediate two byte destination to the `PC`, and the _verylow_ cost of `RETURNSUB` is justified by needing only to pop the return stack into the `PC`. The _low_ cost of `RJUMP` is justified by needing even less work than `JUMPSUB`, and the cost of `RJUMPI` is `mid` because of the extra work to test the conditional. `RJUMPV` is at least as costly as `RJUMPI`, with extra work for each `offset`. Benchmarking will be needed to tell if the costs are well-balanced.
### Validity
Attempts to create contracts that can be proven to be invalid will fail.
We define safety here as avoiding exceptional halting states:
* Valid contracts will not halt with an exception unless they
* run out of gas or
* overflow stack while making a recursive subroutine call.
Invalid contracts will have one or more
* invalid instructions,
* invalid jump destinations,
* stack underflows, or
* stack overflows without recursion.
Attempts to create contracts that cannot be proven to be valid will fail.
Because of dynamic JUMP and JUMPI instructions the rules below give necessary but not sufficient conditions for validity. So long as we have unrestricted dynamic jumps we cannot prove contracts to be valid, only invalid. (See [EIP-3779](./eip-3779.md) for further discussion of validation and a means of restricting dynamic jumps.)
#### Exceptional Halting States
_Execution_ is as defined in the [Yellow Paper](https://ethereum.github.io/yellowpaper/paper.pdf) a sequence of changes to the EVM state. The conditions on valid _code_ are preserved by state changes. At runtime, if execution of an instruction would violate a condition the execution is in an exceptional halting state. The Yellow Paper defines five such states.
1. Insufficient gas
2. More than 1024 stack items
3. Insufficient stack items
4. Invalid jump destination
5. Invalid instruction
We would like to consider EVM _code_ valid iff no execution of the program can lead to an exceptional halting state, but we must be able to validate _code_ in linear time to avoid denial of service attacks. So in practice, we can only partially meet these requirements. Our validation rules do not consider the _code's_ data and computations, only its control flow and stack use. This means we will reject programs with any invalid _code_ paths, even if those paths are not reachable at runtime.
### Validation Rules
> This section extends the contact creation validation rules (as defined in EIP-3540 and EIP-3670.)
1. Every `RJUMP` and `RJUMPI` addresses a valid `JUMPDEST`.
2. The stack depth is
1. Every `RJUMP`, `RJUMPI`, and `RJUMPV` addresses only valid `JUMPDEST`s.
2. The `stack depth` is
* always positive and
* the same on every path through an opcode.
3. The number of items on the `stack` is at most 1024.
3. The number of items on the `data stack` and on the `return stack` is at most 1024.
The Yellow Paper has the `stack pointer` (`SP`) pointing just past the top item on the `data stack`. We define the `stack base` (`BP`)as the element that the `SP` addressed at the entry to the current _basic block_, or `0` on program entry, and the `stack depth` as the number of stack elements between the current `SP` and the current `BP`.
Taken together, these rules allow for code to be (in-)validated by traversing the control-flow graph, following each edge only once.
Taken together, these rules allow for code to be validated by traversing the control-flow graph, following each edge only once.
_Note that this specification is entirely semantic. It constrains only data usage and control flow and imposes no syntax on code beyond being a sequence of bytes to be executed._
## Rationale
We modeled this design on Moore's 1970 [Forth virtual machine](http://www.ultratechnology.com/4th_1970.pdf). It is a simple two-stack design the data stack is supplemented with a return stack to support jumping to and returning from subroutines, as specified above, and as conceptualized by Turing. The return address (Turing's "note") is pushed onto the return stack (Turing's "delay line") when calling, and the return address is popped into the `PC` when returning.
This is a simple two-stack design the data stack is supplemented with a return stack to support jumping to and returning from subroutines, as specified above, and as conceptualized by Turing. The return address (Turing's "note") is pushed onto the return stack (Turing's "delay line") when calling, and the '`PC`' is popped off of the `PC` when returning.
The alternative design is to push the return address and the destination address on the data stack before jumping to the subroutine, and to later jump back to the return address on the stack in order to return. This is the current approach. It could be streamlined to some extent by having JUMPSUB push the return address for RETURNSUB to pop.
We prefer the separate return stack because it maintains a clear separation between data and flow of control. This ensures that the return address cannot be overwritten or mislaid. It also reduces costs by using fewer data stack slots and moving less data.
## Backwards Compatibility
These changes do not affect the semantics of existing EVM code. These changes are compatible with the restricted forms of `JUMP` and `JUMPI` specified by [EIP-3779](./eip-3779.md) -- contracts following all of the rules given there and here will be valid.
@ -350,9 +368,9 @@ Bytecode: `0x6005565b5d5b60035e` (`PUSH1 0x05, JUMP, JUMPDEST, RETURNSUB, JUMPDE
Consumed gas: `30`
## Reference Implementation
## Validation Algorithm
### Validation Algorithm
> This section specifies an algorithm for checking the above the rules. Equivalent code must be run at creation time. We assume that the validation defined in EIP-3540 and EIP-3670 has already run, although in practice the algorithms can be merged.
The following is a pseudo-Go implementation of an algorithm for enforcing adherence to the above rules. This algorithm is a symbolic execution of the program that recursively traverses the bytecode, following its control flow and stack use and checking for violations of the rules above. It uses a stack to track the slots that hold `PUSHed` constants, from which it pops the destinations to validate during the analysis.
@ -414,17 +432,19 @@ func validate(pc := 0, depth := 0) boolean {
pc = jumpdest
continue
}
if (instruction == RJUMPI {
if (instruction == RJUMPV {
// check for valid destination
jumpdest = pc + imm_data(pc)
if !valid_jumpdest(jumpdest) {
return false
}
// recurse to jump to code to validate
if !validate(jumpdest) {
return false
n = imm_data(pc += 2)
for i := 0; i < n; n-- {
jumpdest = pc + n + imm_data(pc)
if !valid_jumpdest(jumpdest) {
return false
}
// recurse to jump to code to validate
if !validate(jumpdest) {
return false
}
}
// false side of conditional
@ -453,5 +473,7 @@ func validate(pc := 0, depth := 0) boolean {
These changes do introduce new flow control instructions, so any software which does static/dynamic analysis of EVM code needs to be modified accordingly. The `JUMPSUB` semantics are similar to `JUMP` whereas the `RETURNSUB` instruction is different, since it can 'land' on any opcode (but the possible destinations can be statically inferred).
The validation algorithm must run in time and space near-linear in the size of its input so that a it can be charged appropriate gas to avoid DoS attack. `RJUMPV` takes its arguments inline so that attempts to attack the validation algorithm will fail by reducing the space available to attack it in.
## Copyright
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).