EIPs/EIPS/eip-2315.md

330 lines
15 KiB
Markdown
Raw Normal View History

---
eip: 2315
title: Simple Subroutines for the EVM
status: Review
type: Standards Track
category: Core
author: Greg Colvin <greg@colvin.org>, Martin Holst Swende (@holiman)
discussions-to: https://ethereum-magicians.org/t/eip-2315-simple-subroutines-for-the-evm/3941
created: 2019-10-17
---
## Abstract
This proposal introduces three opcodes to support subroutines: `BEGINSUB`, `JUMPSUB` and `RETURNSUB`.
Safety and amenability to static analysis equivalent to [EIP-615](https://eips.ethereum.org/EIPS/eip-615) can be ensured by enforcing a few simple rules, and validated with the provided algorithm.
## Motivation
The EVM does not provide subroutines as a primitive. Instead, calls can be synthesized by fetching and pushing the current program counter on the data stack and jumping to the subroutine address; returns can be synthesized by getting the return address to the top of the stack and jumping back to it. Over the course of 30 years the computer industry struggled with this complexity and cost and settled in on providing primitive operations to directly support subroutines. These are provided in some form by most all physical and virtual machines going back at least 50 years.
In whatever form, these operations provide for capturing the current context of execution, transferring control to a new context, and returning to original context.
We propose a safe, simple _return-stack_ mechanism, proven to work well for stack machines, which we specify here. Note that this specification is entirely semantic. It constrains only stack usage and control flow and imposes no syntax on code beyond being a sequence of bytes to be executed.
## Specification
We introduce one more stack into the EVM in addition to the existing `data stack` which we call the `return stack`. The `return stack` is limited to `1024` items.
#### `BEGINSUB`
Marks the entry point to a subroutine. Execution of a `BEGINSUB` is a no-op.
#### `JUMPSUB`
Transfers control to a subroutine.
2020-05-15 05:17:17 -04:00
1. Pop the `location` off the `data stack`.
2. If the opcode at `location` is not a `BEGINSUB` _`abort`_.
3. If the `return stack` already has `1024` items _`abort`_.
4. Push the current `pc + 1` to the `return stack`.
5. Set `pc` to `location + 1`.
* _pops one item off the `data stack`_
* _pushes one item on the `return stack`_
#### `RETURNSUB`
Returns control to the caller of a subroutine.
1. If the `return stack` is empty _`abort`_.
2. Pop `pc` off the `return stack`.
* _pops one item off the `return stack`_
_Note 1: If a resulting `pc` to be executed is beyond the last instruction then the opcode is implicitly a `STOP`, which is not an error._
_Note 2: Values popped off the `return stack` do not need to be validated, since they are alterable only by `JUMPSUB` and `RETURNSUB`._
_Note 3: The description above lays out the semantics of this feature in terms of a `return stack`. But the actual state of the `return stack` is not observable by EVM code or consensus-critical to the protocol. (For example, a node implementor may code `JUMPSUB` to unobservably push `pc` on the `return stack` rather than `pc + 1`, which is allowed so long as `RETURNSUB` observably returns control to the `pc + 1` location.)_
2020-05-15 05:17:17 -04:00
### Indirect Jumps
If [EIP-3337 BEGINDATA](https://github.com/ethereum/EIPs/blob/master/EIPS/eip-3337.md) is implemented then the indirect jumps from [EIP-615](https://eips.ethereum.org/EIPS/eip-615) -- `JUMPV` and `JUMPSUBV` -- can be implemented. These would take two arguments on the stack: a constant offset relative to `BEGINDATA` to a jump table, and a variable index into that table. Detailed specifications can await the acceptance EIP-3337.
## Rationale
We modeled this design on the simple, proven, archetypal Forth virtual machine of 1970. It is a two-stack design -- the data stack is supplemented with a return stack to support jumping into and returning from subroutines, as specified above. The separate return stack ensures that the return address cannot be overwritten or mislaid, and obviates any need to swap the return address past the arguments on the stack. Importantly, a dynamic jump is not needed to implement subroutine returns, allowing for deprecation of dynamic uses of JUMP and JUMPI. Deprecating dynamic jumps is key to practical static analysis of code.
(JUMPSUB and RETURNSUB were also defined in terms of a `return stack` in [EIP-615](https://eips.ethereum.org/EIPS/eip-615))
.
## Backwards and Forwards Compatibility
These changes do not affect the semantics of existing EVM code.
These changes are compatible with using [EIP-3337](https://github.com/ethereum/EIPs/blob/master/EIPS/eip-3337.md) to provide stack frames, by associating a frame with each subroutine.
## Implementations
Three clients have implemented this (or an earlier version of) this proposal:
- [geth](https://github.com/ethereum/go-ethereum/pull/20619) .
- [besu](https://github.com/hyperledger/besu/pull/717), and
- [openethereum](https://github.com/openethereum/openethereum/pull/11629).
### Costs and Codes
We suggest that the cost of
- `BEGINSUB` be _jumpdest_ (`1`)
- `JUMPSUB` be _high_ (`10`)
- This is the same as `JUMPI`, and `2` more than `JUMP`.
- `RETURNSUB` be _low_ (`5`).
Benchmarking might be needed to tell if the costs are well-balanced.
We suggest the following opcodes:
```
0x5c BEGINSUB
0x5d RETURNSUB
0x5e JUMPSUB
```
## Security Considerations
These changes do introduce new flow control instructions, so any software which does static/dynamic analysis of evm-code needs to be modified accordingly. The `JUMPSUB` semantics are similar to `JUMP` (but jumping to a `BEGINSUB`), whereas the `RETURNSUB` instruction is different, since it can 'land' on any opcode (but the possible destinations can be statically inferred).
The safety and amenability to static analysis of valid programs is equivalent to EIP-615, but without imposing syntactic constraints, and thus with minimal impact on low-level optimizations. Validity is ensured by the following rules, and programs can be validated with the provided algorithm.
As with EIP-615, contract code must be validated at deploy time for contracts created by external transactions. Internal calls must not be validated, as they may be from older contracts generating older, unsafe code. Unlike EIP-615, backwards compatibility means that no versioning is needed.
However, as soon as these rules are enforced compilers that generate dynamic jumps will be broken. Therefore, in the initial upgrade there should not be any deploy-time validation, though compilers are encouraged to emit only valid code from the start. A future upgrade will start enforcing the rules once compilers and tools are ready.
### Validity
We would like to consider EVM code valid iff no execution of the program can lead to an exceptional halting state, but we must validate code in linear time. (More precisely, in time `O(vertices + edges) in the control-flow graph.) So our validation algorithm does not consider the codes data and computations, only its control flow and stack use. This means we will reject programs with any invalid code paths, even if those paths are not reachable at runtime.
_Execution_ is as defined in the [Yellow Paper](https://ethereum.github.io/yellowpaper/paper.pdf)—a sequence of changes in the EVM state. The conditions on valid code are preserved by state changes. At runtime, if execution of an instruction would violate a condition the execution is in an exceptional halting state. The Yellow Paper defines five such states.
1. Insufficient gas
2. More than 1024 stack items
3. Insufficient stack items
4. Invalid jump destination
5. Invalid instruction
Conditions 1 and 2 -- Insufficient gas and stack overflow, must be checked at runtime. Conditions 3, 4, and 5 cannot occur if the code conforms to the following rules.
* `JUMP` and `JUMPI` address only valid `JUMPDEST` instructions.
* `JUMPSUB` addresses only valid `BEGINSUB` instructions.
Valid instructions are not part of PUSH data.
* `JUMP`, `JUMPI` and `JUMPSUB` are always preceded by one of the `PUSH` instructions.
Requiring a `PUSH` before each `JUMP` forbids dynamic jumps. Absent dynamic jumps another mechanism is needed for subroutine returns, as provided here.
The `stack pointer` or `SP` points just past the top item on the `data stack`. We define the `stack depth` as the number of stack elements between the current `SP` and the current `stack base`. The `stack base` was the `SP` at the previous `JUMPSUB`, or `0` on program entry. So we can check for all stack underflows and some stack overflows.
* The `stack depth` is always positive and at most 1024.
Control flows which return to the same place with a different `stack depth` are invalid. These can be caused by irreducible paths like jumping into loops and subroutines.
* For each instruction in the code the `stack depth` is always the same.
### Validation
The following is a pseudo-go specification of an algorithm for enforcing program validity. It recursively traverses the bytecode, following its control flow and stack use and checking for violations of the rules above. (For simplicity we ignore the issue of JUMPDEST or BEGINSUB bytes in PUSH data.) It runs in time == O(vertices + edges) in the program's control-flow graph.
```
bytecode []byte
stack_depth []int
SP := 0
validate(PC :=0)
{
// traverse code sequentially, recurse for subroutines and conditional jumps
while true
{
instruction = bytecode[PC]
// if stack depth set we have been here before
// check for constant depth and return to break cycle
if stack_depth[PC] != 0 {
if SP != stack_depth[PC]
return false
return true
}
stack_depth[PC] = SP
// effect of instruction on stack
SP -= removed_items(instruction)
SP += added_items(instruction)
if SP < 0 || 1024 < SP
return false
// successful validation of path
if instruction == STOP, RETURN, or SUICIDE
return true
if instruction == JUMP
{
// check for constant and correct destination
if (bytecode[PC - 33] != PUSH32)
return false
PC = stack[PC-32]
if byte_code[PC] != JUMPDEST
return false
// reset PC to destination of jump
PC = stack[PC-32]
continue
}
if instruction == JUMPI
{
// check for constant and correct destination
if (bytecode[PC - 33] != PUSH32)
return false
PC = stack[PC-32]
if byte_code[PC] != JUMPDEST
return false
// recurse to jump to code to validate
if !validate(stack[SP]))
return false
continue
}
if instruction == JUMPSUB
{
// check for constant and correct destination
if (bytecode[PC - 33] != PUSH32)
return false
prevPC = PC
PC = stack[PC-32]
if byte_code[PC] != BEGINSUB
return false
// recurse to jump to code to validate
prevSP = SP
depth = SP - prevSP
SP = depth
if !validate(stack[SP]+1))
return false
SP = prevSP - depth + SP
PC = prevPC
continue
}
if instruction == RETURNSUB
{
// successful return from recursion
PC = prevPC
return true
}
// advance PC according to instruction
PC = advance_pc(PC, instruction)
}
return true
}
```
## Test Cases
### Simple routine
This should jump into a subroutine, back out and stop.
2020-05-29 23:06:28 +02:00
Bytecode: `0x60045e005c5d` (`PUSH1 0x04, JUMPSUB, STOP, BEGINSUB, RETURNSUB`)
| Pc | Op | Cost | Stack | RStack |
|-------|-------------|------|-----------|-----------|
| 0 | PUSH1 | 3 | [] | [] |
2020-05-29 17:00:56 +02:00
| 2 | JUMPSUB | 10 | [4] | [] |
| 5 | RETURNSUB | 5 | [] | [ 2] |
| 3 | STOP | 0 | [] | [] |
2020-05-29 17:00:56 +02:00
Output: 0x
Consumed gas: `18`
### Two levels of subroutines
This should execute fine, going into one two depths of subroutines
2020-05-29 17:00:56 +02:00
2020-05-29 23:06:28 +02:00
Bytecode: `0x6800000000000000000c5e005c60115e5d5c5d` (`PUSH9 0x00000000000000000c, JUMPSUB, STOP, BEGINSUB, PUSH1 0x11, JUMPSUB, RETURNSUB, BEGINSUB, RETURNSUB`)
| Pc | Op | Cost | Stack | RStack |
|-------|-------------|------|-----------|-----------|
| 0 | PUSH9 | 3 | [] | [] |
2020-05-29 17:00:56 +02:00
| 10 | JUMPSUB | 10 | [12] | [] |
| 13 | PUSH1 | 3 | [] | [10] |
2020-05-29 17:00:56 +02:00
| 15 | JUMPSUB | 10 | [17] | [10] |
| 18 | RETURNSUB | 5 | [] | [10,15] |
| 16 | RETURNSUB | 5 | [] | [10] |
| 11 | STOP | 0 | [] | [] |
2020-05-29 17:00:56 +02:00
Consumed gas: `36`
### Failure 1: invalid jump
2020-05-29 17:00:56 +02:00
This should fail, since the given location is outside of the code-range. The code is the same as previous example,
except that the pushed location is `0x01000000000000000c` instead of `0x0c`.
2020-05-29 23:06:28 +02:00
Bytecode: `0x6801000000000000000c5e005c60115e5d5c5d` (`PUSH9 0x01000000000000000c, JUMPSUB, STOP, BEGINSUB, PUSH1 0x11, JUMPSUB, RETURNSUB, BEGINSUB, RETURNSUB`)
| Pc | Op | Cost | Stack | RStack |
|-------|-------------|------|-----------|-----------|
| 0 | PUSH9 | 3 | [] | [] |
2020-05-29 17:00:56 +02:00
| 10 | JUMPSUB | 10 |[18446744073709551628] | [] |
```
2020-05-29 17:00:56 +02:00
Error: at pc=10, op=JUMPSUB: invalid jump destination
```
### Failure 2: shallow `return stack`
2020-05-29 17:00:56 +02:00
This should fail at first opcode, due to shallow `return_stack`
2020-05-29 23:06:28 +02:00
Bytecode: `0x5d5858` (`RETURNSUB, PC, PC`)
| Pc | Op | Cost | Stack | RStack |
|-------|-------------|------|-----------|-----------|
2020-05-29 17:00:56 +02:00
| 0 | RETURNSUB | 5 | [] | [] |
```
2020-05-29 17:00:56 +02:00
Error: at pc=0, op=RETURNSUB: invalid retsub
```
2020-04-16 13:30:18 +02:00
### Subroutine at end of code
In this example. the JUMPSUB is on the last byte of code. When the subroutine returns, it should hit the 'virtual stop' _after_ the bytecode, and not exit with error
2020-05-29 23:06:28 +02:00
Bytecode: `0x6005565c5d5b60035e` (`PUSH1 0x05, JUMP, BEGINSUB, RETURNSUB, JUMPDEST, PUSH1 0x03, JUMPSUB`)
2020-04-16 13:30:18 +02:00
| Pc | Op | Cost | Stack | RStack |
|-------|-------------|------|-----------|-----------|
| 0 | PUSH1 | 3 | [] | [] |
| 2 | JUMP | 8 | [5] | [] |
| 5 | JUMPDEST | 1 | [] | [] |
| 6 | PUSH1 | 3 | [] | [] |
2020-05-29 17:00:56 +02:00
| 8 | JUMPSUB | 10 | [3] | [] |
| 4 | RETURNSUB | 5 | [] | [ 8] |
2020-04-16 13:30:18 +02:00
| 9 | STOP | 0 | [] | [] |
2020-05-29 17:00:56 +02:00
Consumed gas: `30`
## References
Gavin Wood, [Ethereum: A Secure Decentralized Generalized Transaction Ledger](https://ethereum.github.io/yellowpaper/paper.pdf), 2014-2021
Greg Colvin, Brooklyn Zelenka, Paweł Bylica, Christian Reitwiessner, [EIP-615: Subroutines and Static Jumps for the EVM](https://eips.ethereum.org/EIPS/eip-615), 2016-2019
Martin Lundfall, [EIP-2327: BEGINDATA Opcode](https://github.com/ethereum/EIPs/blob/master/EIPS/eip-2327.md), 2019
Nick Johnson, [EIP-3337: Frame pointer support for memory load and store operations](https://github.com/ethereum/EIPs/blob/master/EIPS/eip-3337.md), 2021
## Copyright
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).