EIPs/EIPS/eip-2315.md

511 lines
23 KiB
Markdown
Raw Normal View History

---
eip: 2315
title: Simple Subroutines for the EVM
status: Draft
type: Standards Track
category: Core
author: Greg Colvin <greg@colvin.org>, Martin Holst Swende (@holiman)
discussions-to: https://ethereum-magicians.org/t/eip-2315-simple-subroutines-for-the-evm/3941
created: 2019-10-17
---
## Simple Summary
A small change to the EVM that provides native subroutines.
## Abstract
This proposal introduces three opcodes to support subroutines: `BEGINSUB`, `JUMPSUB` and `RETURNSUB`. It deprecates `JUMP` and `JUMPI`.
Substantial gains in efficiency and reduced gas costs are achieved.
Safety properties equivalent to [EIP-615](https://eips.ethereum.org/EIPS/eip-615) — including valid jump destinations — are ensured by following a few simple rules, which are validated at contract creation time — not runtime — by the provided algorithm, and without imposing syntactic constraints.
## Motivation
2021-08-22 16:07:03 -04:00
The EVM does not provide subroutines as a primitive. Instead, calls can be synthesized by fetching and pushing the current program counter on the data stack and jumping to the subroutine address; returns can be synthesized by getting the return address to the top of the stack and jumping back to it. These conventions are more costly than necessary, and impede static analysis of EVM code, especially rapid validation of some important safety properties.
Facilities to directly support subroutines are provided by all but one of the real and virtual machines programmed by the lead author, including the Burroughs 5000, CDC 7600, IBM 360, DEC PDP 11 and VAX, Motorola 68000, a few generations of Intel silicon, Sun SPARC, UCSD p-Machine, Sun JVM, Wasm, and the sole exception — the EVM. In whatever form, these operations provide for
* capturing the current context of execution,
* transferring control to a new context, and
* returning to the original context
* * after possible further transfers of control
* * to some arbitrary depth.
The concept goes back to Turing (1946):
> We also wish to be able to arrange for the splitting up of operations into subsidiary operations. This should be done in such a way that once we have written down how an operation is done we can use it as a subsidiary to any other operation.
> ...
> When we wish to start on a subsidiary operation we need only make a note of where we left off the major operation and then apply the first instruction of the subsidiary. When the subsidiary is over we look up the note and continue with the major operation. Each subsidiary operation can end with instructions for this recovery of the note. How is the burying and disinterring of the note to be done? There are of course many ways. One is to keep a list of these notes in one or more standard size delay lines, (1024) with the most recent last. The position of the most recent of these will be kept in a fixed TS, and this reference will be modified every time a subsidiary is started or finished...
We propose to use Turing's simple mechanism, long known to work well for virtual stack machines, as specified below. Note that this specification is entirely semantic. It constrains only stack usage and control flow and imposes no syntax on code beyond being a sequence of bytes to be executed.
## Specification
We introduce one more stack into the EVM in addition to the existing `data stack`, known as the `return stack`. The `return stack` is limited to `1024` items. This stack supports three new instructions for subroutines.
### `BEGINSUB`
> Marks an entry point to a subroutine. Execution of a `BEGINSUB` is a no-op.
2020-05-15 05:17:17 -04:00
#### `JUMPSUB location`
> Transfers control to a subroutine.
>
> 1. Decode the `location` from the immediate data. The data is encoded as two bytes, MSB-first.
> 2. If the opcode at `location` is not a `BEGINSUB` _`abort`_.
> 3. If the `return stack` already has `1024` items _`abort`_.
> 4. Push the current `PC + 1` to the `return stack`.
> 5. Set `PC` to `location`.
>
> * _pops one item off the `data stack`_
> * _pushes one item on the `return stack`_
#### `RETURNSUB`
> Returns control to the caller of a subroutine.
>
> 1. If the `return stack` is empty _`abort`_.
> 2. Pop `PC` off the `return stack`.
>
> * _pops one item off the `return stack`_
_Notes:_
* _If a resulting `PC` to be executed is beyond the last instruction then the opcode is implicitly a `STOP`, which is not an error._
* _Values popped off the `return stack` do not need to be validated, since they are alterable only by `JUMPSUB` and `RETURNSUB`._
* _The description above lays out the semantics of this feature in terms of a `return stack`. But the actual state of the `return stack` is not observable by EVM code or consensus-critical to the protocol. (For example, a node implementor may code `JUMPSUB` to unobservably push `PC` on the `return stack` rather than `PC + 1`, which is allowed so long as `RETURNSUB` observably returns control to the `PC + 1` location.)_
* _The `return stack is the functional equivalent of Turing's "delay line" of "where we left off"._
2020-05-15 05:17:17 -04:00
### Dependencies
We need [EVM Object Format (EOF)](https://eips.ethereum.org/EIPS/eip-3540) to allow for immediate arguments without special encoding.
We need [EOF Static Relative Jumps](https://github.com/ethereum/evmone/pull/351) or the equivalent to define and validate the rules for contract safety. These jumps — `RJUMP offset` and `RJUMPI offset` — transfer control to a location in the bytecode at a fixed offset relative to the current `PC`.
## Rationale
We modeled this design on the proven, archetypal Forth virtual machine of 1970. It is a two-stack design — the data stack is supplemented with a return stack to support jumping into and returning from subroutines, as specified above. The return address (Turing's "note") is pushed onto the return stack when calling, and the stack is popped into the `PC` when returning.
The alternative design is to push the return address and the destination address on the data stack before jumping, and to pop the data stack and jump back to the popped `PC` to return. We prefer the separate return stack because it ensures that the return address cannot be overwritten or mislaid, uses fewer data stack slots, and obviates any need to swap the return address past the arguments or return values on the stack. Crucially, a dynamic jump is not needed to implement subroutine returns, allowing for deprecation of `JUMP` and `JUMPI`.
(`JUMPSUB` and `RETURNSUB` are also defined in terms of a `return stack` in [EIP-615](https://eips.ethereum.org/EIPS/eip-615)).
## Backwards and Forwards Compatibility
These changes affect the semantics of existing EVM code. The EVM Object Format is required to allow for immediate data.
These changes are compatible with using [EIP-3337](https://eips.ethereum.org/EIPS/eip-3337) to provide stack frames, by associating a frame with each subroutine, as sketched out in the Appendix below.
## Implementations
Three clients have implemented this (or an earlier version of this) proposal:
- [geth](https://github.com/ethereum/go-ethereum/pull/20619),
- [besu](https://github.com/hyperledger/besu/pull/717), and
- [openethereum](https://github.com/openethereum/openethereum/pull/11629).
### Costs and Codes
We suggest that the cost of
- `BEGINSUB` be _jumpdest_ (`1`)
- `JUMPSUB` be _low_ (`5`)
- `RETURNSUB` be _verylow_ (`3`)
The _low_ cost of `JUMPSUB` is justified by needing only about six Go operations, pushing the return address on the return stack, and decoding the two byte destination to the `PC`. The _verylow_ cost of `RETURNSUB` is justified by needing only about three Go operations to pop the return stack into the `PC`. No 256-bit arithmetic or checking for valid destinations is needed.
Benchmarking might be needed to tell if the costs are well-balanced.
We suggest the following opcodes:
```
0x5c BEGINSUB
0x5d RETURNSUB
0x5e JUMPSUB
```
### Efficiency
These opcodes increase the efficiency and reduce the gas costs of both ordinary subroutine calls and low-level optimizations.
#### Subroutine Call
Consider this example of calling a minimal subroutine
using `JUMPDEF`
```
TEST_DIV:
beginsub ; 1 gas
0x02 ; 3 gas
0x03 ; 3 gas
jumpsub DIVIDE ; 5 gas
returnsub ; 3 gas
DIVIDE:
beginsub ; 1 gas
2021-08-22 16:07:03 -04:00
div ; 5 gas
returnsub ; 3 gas
Total 24 gas.
```
The same code, using `JUMP`.
```
TEST_DIV:
jumpdest ; 1 gas
2021-08-22 16:07:03 -04:00
RTN_DIV ; 3 gas
0x02 ; 3 gas
0x03 ; 3 gas
DIVIDE ; 3 gas
jump ; 8 gas
RTN_DIV:
jumpdest ; 1 gas
swap1 ; 3 gas
jump ; 8 gas
DIVIDE:
jumpdest ; 1 gas
div ; 5 gas
swap1 ; 3 gas
jump ; 8 gas
Total: 50 gas
```
Using `JUMP` uses **_50 - 24 = 26_** more gas than using `JUMPSUB`.
#### Tail Call Optimization
Of course in cases like this one we can optimize the tail call, so that the final jump in `DIVIDE` actually returns from TEST_DIV.
```
TEST_DIV:
beginsub ; 1 gas
0x02 ; 3 gas
0x03 ; 3 gas
rjump DIVIDE ; 3 gas
DIVIDE:
beginsub ; 1 gas
div ; 5 gas
returnsub ; 3 gas
Total: 22 gas
```
Or the same code, using `JUMP`
```
TEST_DIV:
jumpdest ; 1 gas
0x02 ; 3 gas
0x03 ; 3 gas
DIVIDE ; 3 gas
jump ; 8 gas
DIVIDE:
jumpdest ; 1 gas
div ; 5 gas
swap1 ; 3 gas
jump ; 8 gas
Total: 35 gas
```
Using `JUMP` cost **_35 - 22 = 13_** more gas than using `JUMPSUB`.
#### Tail Call Elimination
We can even take advantage of `DIVIDE` just happening to directly follow `TEST_DIV` and just fall through rather than jump at all.
```
TEST_DIV:
beginsub ; 1 gas
0x02 ; 3 gas
0x03 ; 3 gas
DIVIDE:
beginsub ; 1 gas
div ; 5 gas
returnsub ; 3 gas
Total 16 gas.
```
The same code, using JUMP.
```
TEST_DIV:
jumpdest ; 1 gas
0x02 ; 3 gas
0x03 ; 3 gas
DIVIDE:
jumpdest ; 1 gas
div ; 5 gas
swap1 ; 3 gas
jump ; 8 gas
Total: 24 gas
```
Using `JUMP` costs **_24 - 16 = 8_** more gas than using `JUMPSUB`.
## Security Considerations
2021-08-22 16:07:03 -04:00
These changes do introduce new flow control instructions, so any software which does static/dynamic analysis of EVM code needs to be modified accordingly. The `JUMPSUB` semantics are similar to `JUMP` (but jumping to a `BEGINSUB`), whereas the `RETURNSUB` instruction is different, since it can 'land' on any opcode (but the possible destinations can be statically inferred).
Safety and amenability to static analysis of valid programs are comparable to [EIP-615](https://eips.ethereum.org/EIPS/eip-615), but without imposing syntactic constraints, and thus with improved low-level optimizations (as shown above.) Validity can ensured by following the rules given in the next section, and programs can be validated with the provided algorithm. The validation algorithm is simple and bounded by the size of the code, allowing for validation at creation time. And compilers can easily follow the rules.
### Validity
We define safety here as avoiding exceptional halting states.
#### Exceptional Halting States
_Execution_ is as defined in the [Yellow Paper](https://ethereum.github.io/yellowpaper/paper.pdf) — a sequence of changes to the EVM state. The conditions on valid code are preserved by state changes. At runtime, if execution of an instruction would violate a condition the execution is in an exceptional halting state. The Yellow Paper defines five such states.
1. Insufficient gas
2. More than 1024 stack items
3. Insufficient stack items
4. Invalid jump destination
5. Invalid instruction
We would like to consider EVM code valid iff no execution of the program can lead to an exceptional halting state, but we must be able to validate code in linear time to avoid denial of service attacks. So in practice, we can only partially meet these requirements. Our validation algorithm does not consider the codes data and computations, only its control flow and stack use. This means we will reject programs with any invalid code paths, even if those paths are not reachable at runtime. Further, conditions *1* and *2* — Insufficient gas and stack overflow — must in general be checked at runtime. Conditions *3*, *4*, and *5* cannot occur if the code conforms to the following rules.
#### The Rules
0. `JUMP` and `JUMPI` are deprecated.
1. `RJUMP` and `RJUMPI` address only valid `JUMPDEST` instructions.
2. `JUMPSUB` addresses only valid `BEGINSUB` instructions.
3. For each instruction in the code the `stack depth` is always the same.
4. The `stack depth` is always positive and at most 1024.
*Rule 0*, deprecating `JUMP` and `JUMPI`, forbids dynamic jumps. Absent dynamic jumps another mechanism is needed for subroutine returns, as provided here.
Jump destinations are currently checked at runtime. Static jumps allow them to be validated at creation time, per *rule 1* and *rule 2*. *Note: Valid instructions are not part of immediate data.*
For *rule 3* and *rule 4* we need to define `stack depth`. The Yellow Paper has the `stack pointer` or `SP` pointing just past the top item on the `data stack`. We define the `stack base` as where the `SP` pointed before the most recent `JUMPSUB`, or `0` on program entry. So we can define the `stack depth` as the number of stack elements between the current `SP` and the current `stack base`.
Given our definition of `stack depth`, *rule 3* ensures that control flows which return to the same place with a different `stack depth` are invalid. These can be caused by irreducible paths like jumping into loops and subroutines, and calling subroutines with different numbers of arguments. Taken together, these rules allow for code to be validated by following the control-flow graph, traversing each edge only once.
Finally, *rule 4* catches all stack underflows. It also catches stack overflows in programs that overflow without (or before) recursion.
### Validation
The following is a pseudo-Go implementation of an algorithm for enforcing program validity being a symbolic execution that recursively traverses the bytecode, following its control flow and stack use and checking for violations of the rules above.
This algorithm runs in time equal to `O(vertices + edges)` in the program's control-flow graph, where vertices represent control-flow instructions and the edges represent basic blocks thus the algorithm takes time proportional to the size of the bytecode.
(For simplicity we assume an `advance_pc()` routine to skip immediate data and `removed_items()` and `added_items()` functions that return the effect of each instruction on the stack. We also don't specify `JUMPTABLE` — which amounts to a loop over `RJUMP`.)
```
var bytecode []byte
var stack_depth []int
var SP := 0
func validate(PC :=0) boolean {
// traverse code sequentially
// recurse for subroutines and conditional jumps
while true {
instruction = bytecode[PC]
if is_invalid(instruction) {
return false;
}
// stack depth must be the same
if SP != stack_depth[PC] {
return false
}
// if stack depth non-zero we have been here before
// return true to break cycle in control flow graph
if stack_depth[PC] != 0 {
return true
}
stack_depth[PC] = SP
// effect of instruction on stack
SP -= removed_items(instruction)
SP += added_items(instruction)
if SP < 0 || 1024 < SP {
return false
}
// successful validation of path
if instruction == STOP, RETURN, or SUICIDE {
return true
}
if instruction == RJUMP {
// check for valid destination
jumpdest = *PC, PC++, jumpdest << 8, jumpdest = *PC
if bytecode[jumpdest] != JUMPDEST {
return false
}
// reset PC to destination of jump
PC = jumpdest
continue
}
if instruction == RJUMPI {
// check for valid destination
jumpdest = *PC, PC++, jumpdest << 8, jumpdest = *PC
if bytecode[jumpdest] != JUMPDEST {
return false
}
// reset PC to destination of jump
PC = jumpdest
// recurse to jump to code to validate
if !validate(stack[SP])) {
return false
}
continue
}
if instruction == JUMPSUB {
// check for valid destination
jumpdest = *PC, PC++, jumpdest << 8
if bytecode[jumpdest] != BEGINSUB {
return false
}
// recurse to jump to code to validate
prevSP = SP
depth = SP - prevSP
SP = depth
if !validate(stack[SP]+1)) {
return false
}
SP = prevSP - depth + SP
PC = prevPC
continue
}
if instruction == RETURNSUB {
PC = prevPC
return true
}
// advance PC according to instruction
PC = advance_pc(PC, instruction)
}
}
```
## Test Cases
### Simple routine
This should jump into a subroutine, back out and stop.
2020-05-29 23:06:28 +02:00
Bytecode: `0x60045e005c5d` (`PUSH1 0x04, JUMPSUB, STOP, BEGINSUB, RETURNSUB`)
| Pc | Op | Cost | Stack | RStack |
|-------|-------------|------|-----------|-----------|
| 0 | JUMPSUB | 5 | [] | [] |
| 3 | RETURNSUB | 5 | [] | [0] |
| 4 | STOP | 0 | [] | [] |
2020-05-29 17:00:56 +02:00
Output: 0x
Consumed gas: `10`
2020-05-29 17:00:56 +02:00
### Two levels of subroutines
This should execute fine, going into one two depths of subroutines
2020-05-29 17:00:56 +02:00
2020-05-29 23:06:28 +02:00
Bytecode: `0x6800000000000000000c5e005c60115e5d5c5d` (`PUSH9 0x00000000000000000c, JUMPSUB, STOP, BEGINSUB, PUSH1 0x11, JUMPSUB, RETURNSUB, BEGINSUB, RETURNSUB`)
| Pc | Op | Cost | Stack | RStack |
|-------|-------------|------|-----------|-----------|
| 0 | JUMPSUB | 5 | [] | [] |
| 3 | JUMPSUB | 5 | [] | [0] |
| 4 | RETURNSUB | 5 | [] | [0,3] |
| 5 | RETURNSUB | 5 | [] | [3] |
| 6 | STOP | 0 | [] | [] |
Consumed gas: `20`
### Failure 1: invalid jump
2020-05-29 17:00:56 +02:00
This should fail, since the given location is outside of the code-range. The code is the same as previous example,
except that the pushed location is `0x01000000000000000c` instead of `0x0c`.
Bytecode: (`PUSH9 0x01000000000000000c, JUMPSUB, `0x6801000000000000000c5e005c60115e5d5c5d`, STOP, BEGINSUB, PUSH1 0x11, JUMPSUB, RETURNSUB, BEGINSUB, RETURNSUB`)
| Pc | Op | Cost | Stack | RStack |
|-------|-------------|------|-----------|-----------|
| 0 | JUMPSUB | 10 |[18446744073709551628] | [] |
```
2020-05-29 17:00:56 +02:00
Error: at pc=10, op=JUMPSUB: invalid jump destination
```
### Failure 2: shallow `return stack`
2020-05-29 17:00:56 +02:00
This should fail at first opcode, due to shallow `return_stack`
2020-05-29 23:06:28 +02:00
Bytecode: `0x5d5858` (`RETURNSUB, PC, PC`)
| Pc | Op | Cost | Stack | RStack |
|-------|-------------|------|-----------|-----------|
2020-05-29 17:00:56 +02:00
| 0 | RETURNSUB | 5 | [] | [] |
```
2020-05-29 17:00:56 +02:00
Error: at pc=0, op=RETURNSUB: invalid retsub
```
2020-04-16 13:30:18 +02:00
### Subroutine at end of code
In this example. the JUMPSUB is on the last byte of code. When the subroutine returns, it should hit the 'virtual stop' _after_ the bytecode, and not exit with error
2020-05-29 23:06:28 +02:00
Bytecode: `0x6005565c5d5b60035e` (`PUSH1 0x05, JUMP, BEGINSUB, RETURNSUB, JUMPDEST, PUSH1 0x03, JUMPSUB`)
2020-04-16 13:30:18 +02:00
| Pc | Op | Cost | Stack | RStack |
|-------|-------------|------|-----------|-----------|
| 0 | PUSH1 | 3 | [] | [] |
| 2 | JUMP | 8 | [5] | [] |
| 5 | JUMPDEST | 1 | [] | [] |
| 6 | JUMPSUB | 5 | [] | [] |
| 2 | RETURNSUB | 5 | [] | [2] |
| 7 | STOP | 0 | [] | [] |
2020-04-16 13:30:18 +02:00
2020-05-29 17:00:56 +02:00
Consumed gas: `30`
## Appendix: Stack Frames
Given [EIP-3337](https://eips.ethereum.org/EIPS/eip-3337) — which introduces a frame pointer register `FP` relative to which which memory can be addressed , we can create a frame stack — a list of data frames in memory for arguments and local variables. Two conventions are typical — either create the frame before calling the subroutine, or create the frame after the `BEGINSUB`. Adding these two instructions to EIP-3337 would ease the way. They are modeled on the corresponding Intel opcodes.
#### ENTER _frame_size_
> Write the current `FP` to memory as with MSTOREF at `FP - frame_size`, then set `FP` to `FP - frame size`.
This a shorthand for
```
MSTOREF
GETFP
PUSH frame_size
SUB
SETFP
```
This should be placed either before a `JUMPSUB` operation or after a `BEGINSUB`, depending on the calling convention. Repeated calls to `ENTER` create a stack of frames, linked by their `previous FP` field. The stack grows towards lower addresses in memory, as is the common, efficient practice.
#### `LEAVE`
> Restores `FP` to the value which was stored at offset `FP` in memory by the most recent `ENTER`, thus popping a frame off of the stack of frames. This a shorthand for
```
MLOADF
SETFP
```
This should be placed after the `JUMPSUB` operation or before the `RETURNSUB` depending on the calling convention to pop the most recent frame from the call stack.
These operations would be especially useful if the gas formula for memory was reinterpreted so that writes from the top of memory down (equivalently, from -1 down, twos-complement) are charged the same gas as writes from the bottom of memory up.
The total cost to expand the memory to size _a_ words is
> `Cmem(a) = 3 * a + floor(a ** 2 / 512)`
If the memory is already _b_ words long, the incremental cost is
> `Cmem(a) - Cmem(b)`
If _a_, _b_ and memory offsets in general are allowed to be negative the formula gives the desired results stack memory can grow from the top down, and heap memory can be allocated from the bottom up, without address conflicts or excessive gas charges.
## References
A.M. Turing, [Proposals for the development in the Mathematics Division of an Automatic Computing Engine (ACE)](http://www.alanturing.net/turing_archive/archive/p/p01/P01-001.html) Report E882, Executive Committee, NPL 1946
Alex Beregszaszi, Paweł Bylica, Andrei Maiboroda, [EVM Object Format (EOF) v1](https://eips.ethereum.org/EIPS/eip-3540) 2021
Andrei Maiboroda, [EOF static relative jumps](https://github.com/ethereum/evmone/pull/351)
Gavin Wood, [Ethereum: A Secure Decentralized Generalized Transaction Ledger](https://ethereum.github.io/yellowpaper/paper.pdf), 2014-2021
Greg Colvin, Brooklyn Zelenka, Paweł Bylica, Christian Reitwiessner, [EIP-615: Subroutines and Static Jumps for the EVM](https://eips.ethereum.org/EIPS/eip-615), 2016-2019
Nick Johnson, [EIP-3337: Frame pointer support for memory load and store operations](https://eips.ethereum.org/EIPS/eip-3337), 2021
## Copyright
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).