2020-02-01 21:18:55 -07:00
---
eip: 2315
title: Simple Subroutines for the EVM
status: Draft
type: Standards Track
category: Core
2020-04-03 16:26:26 +02:00
author: Greg Colvin (greg@colvin .org), Martin Holst Swende (@holiman )
2020-02-01 21:18:55 -07:00
discussions-to: https://ethereum-magicians.org/t/eip-2315-simple-subroutines-for-the-evm/3941
created: 2019-10-17
---
2020-05-19 15:35:46 -06:00
2020-02-01 21:18:55 -07:00
## Abstract
2020-05-19 15:35:46 -06:00
This proposal introduces three opcodes to support subroutines: `BEGINSUB` , `JUMPSUB` and `RETURNSUB` .
2020-02-01 21:18:55 -07:00
## Motivation
2020-05-17 17:58:09 -06:00
The EVM does not provide subroutines as a primitive. Instead, calls can be synthesized by fetching and pushing the current program counter on the data stack and jumping to the subroutine address; returns can be synthesized by contriving to get the return address back to the top of stack and jumping back to it. Complex calling conventions are then needed to use the same stack for computation and control flow. Memory allows for simpler conventions but still costs gas. Eschewing subroutines in user code is the least costly -- but also the most failure-prone.
2020-02-19 15:43:36 -07:00
2020-05-17 17:58:09 -06:00
Over the course of 30 years the computer industry struggled with this complexity and cost
and settled in on opcodes to directly support subroutines. These are provided in some form by most all physical and virtual machines going back at least 50 years.
2020-02-19 15:43:36 -07:00
2020-05-19 15:35:46 -06:00
Our design is modeled on the original Forth two-stack machine of 1970. The data stack is supplemented with a return stack to provide simple support for subroutines, as specified below.
2020-05-17 17:58:09 -06:00
In the Appendix we show example solc output for a simple program that uses over three times as much gas just calling and returning from subroutines as comparable code using these opcodes. Actual differences in run-time efficiency will of course vary widely.
2020-02-01 21:18:55 -07:00
## Specification
2020-05-19 15:35:46 -06:00
We introduce one more stack into the EVM in addition to the existing `data stack` which we call the `return stack` . The `return stack` is limited to `1023` items.
2020-04-03 16:26:26 +02:00
2020-05-19 15:35:46 -06:00
#### `BEGINSUB`
2020-02-19 15:43:36 -07:00
2020-05-19 15:35:46 -06:00
Marks the entry point to a subroutine. Attempted execution of a `BEGINSUB` causes an _`abort`_ : terminate execution with an `OOG` (Out Of Gas) exception.
2020-04-03 16:26:26 +02:00
2020-05-19 15:35:46 -06:00
#### `JUMPSUB`
2020-04-03 16:26:26 +02:00
2020-05-19 15:35:46 -06:00
Transfers control to a subroutine.
2020-05-15 05:17:17 -04:00
2020-05-19 15:35:46 -06:00
1. Pop the `location` off the `data stack` .
2. If the opcode at `location` is not a `BEGINSUB` _`abort`_ .
3. If the `return stack` already has `1023` items _`abort`_ .
4. Push the current `pc + 1` to the `return stack` .
5. Set `pc` to `location + 1` .
2020-04-03 16:26:26 +02:00
2020-05-19 15:35:46 -06:00
* _pops one item off the `data stack` _
* _pushes one item on the `return stack` _
2020-04-03 16:26:26 +02:00
2020-05-19 15:35:46 -06:00
#### `RETURNSUB`
2020-02-01 21:18:55 -07:00
2020-05-19 15:35:46 -06:00
Returns control to the caller of a subroutine.
2020-04-03 16:26:26 +02:00
2020-05-19 15:35:46 -06:00
1. If the `return stack` is empty _`abort`_ .
2. Pop `pc` off the `return stack` .
2020-04-03 16:26:26 +02:00
2020-05-19 15:35:46 -06:00
* _pops one item off the `return stack` _
2020-05-17 17:58:09 -06:00
2020-05-19 15:35:46 -06:00
_Note 1: If a resulting `pc` to be executed is beyond the last instruction then the opcode is implicitly a `STOP` , which is not an error._
2020-05-17 17:58:09 -06:00
2020-05-19 15:35:46 -06:00
_Note 2: Values popped off the `return stack` do not need to be validated, since they are alterable only by `JUMPSUB` and `RETURNSUB` ._
2020-02-01 21:18:55 -07:00
2020-05-19 15:35:46 -06:00
_Note 3: The description above lays out the semantics of this feature in terms of a `return stack` . But the actual state of the `return stack` is not observable by EVM code or consensus-critical to the protocol. (For example, a node implementor may code `JUMPSUB` to unobservably push `pc` on the `return stack` rather than `pc + 1` , which is allowed so long as `RETURNSUB` observably returns control to the `pc + 1` location.)_
2020-05-15 05:17:17 -04:00
2020-02-01 21:18:55 -07:00
## Rationale
2020-05-19 15:35:46 -06:00
This is the is a small change that provides native subroutines without breaking backwards compatibility.
2020-02-01 21:18:55 -07:00
## Backwards Compatibility
These changes do not affect the semantics of existing EVM code.
2020-05-17 17:58:09 -06:00
# Test Cases
2020-04-03 16:26:26 +02:00
### Simple routine
This should jump into a subroutine, back out and stop.
2020-04-10 10:57:01 +02:00
Bytecode: `0x6004b300b2b7`
2020-04-03 16:26:26 +02:00
2020-05-19 15:35:46 -06:00
2020-04-03 16:26:26 +02:00
| Pc | Op | Cost | Stack | RStack |
|-------|-------------|------|-----------|-----------|
| 0 | PUSH1 | 3 | [] | [] |
2020-05-26 10:04:23 +02:00
| 2 | JUMPSUB | 8 | [4] | [] |
2020-04-03 16:26:26 +02:00
| 5 | RETURNSUB | 2 | [] | [ 2] |
| 3 | STOP | 0 | [] | [] |
### Two levels of subroutines
This should execute fine, going into one two depths of subroutines
2020-04-10 10:57:01 +02:00
Bytecode: `0x6800000000000000000cb300b26011b3b7b2b7`
2020-04-03 16:26:26 +02:00
| Pc | Op | Cost | Stack | RStack |
|-------|-------------|------|-----------|-----------|
| 0 | PUSH9 | 3 | [] | [] |
2020-05-26 10:04:23 +02:00
| 10 | JUMPSUB | 8 | [12] | [] |
2020-04-03 16:26:26 +02:00
| 13 | PUSH1 | 3 | [] | [10] |
2020-05-26 10:04:23 +02:00
| 15 | JUMPSUB | 8 | [17] | [10] |
2020-04-03 16:26:26 +02:00
| 18 | RETURNSUB | 2 | [] | [10,15] |
| 16 | RETURNSUB | 2 | [] | [10] |
| 11 | STOP | 0 | [] | [] |
### Failure 1: invalid jump
2020-05-17 17:58:09 -06:00
This should fail, since the given `location` is outside of the code-range. The code is the same as previous example,
2020-04-03 16:26:26 +02:00
except that the pushed `location` is `0x01000000000000000c` instead of `0x0c` .
2020-04-10 10:57:01 +02:00
Bytecode: `0x6801000000000000000cb300b26011b3b7b2b7 `
2020-04-03 16:26:26 +02:00
| Pc | Op | Cost | Stack | RStack |
|-------|-------------|------|-----------|-----------|
| 0 | PUSH9 | 3 | [] | [] |
2020-05-26 10:04:23 +02:00
| 10 | JUMPSUB | 8 |[18446744073709551628] | [] |
2020-04-03 16:26:26 +02:00
2020-02-01 21:18:55 -07:00
```
2020-04-03 16:26:26 +02:00
Error: at pc=10, op=JUMPSUB: evm: invalid jump destination
2020-02-02 22:26:07 -07:00
```
2020-04-03 16:26:26 +02:00
2020-05-19 15:35:46 -06:00
### Failure 2: shallow `return stack`
2020-04-03 16:26:26 +02:00
2020-05-19 15:35:46 -06:00
This should fail at first opcode, due to shallow `return stack`
2020-04-03 16:26:26 +02:00
Bytecode: `0xb75858` (`RETURNSUB` , `PC` , `PC` )
| Pc | Op | Cost | Stack | RStack |
|-------|-------------|------|-----------|-----------|
| 0 | RETURNSUB | 2 | [] | [] |
2020-02-02 22:26:07 -07:00
```
2020-04-03 16:26:26 +02:00
Error: at pc=0, op=RETURNSUB: evm: invalid retsub
2020-02-02 22:26:07 -07:00
```
2020-02-01 21:18:55 -07:00
2020-04-16 13:30:18 +02:00
### Subroutine at end of code
In this example. the JUMPSUB is on the last byte of code. When the subroutine returns, it should hit the 'virtual stop' _after_ the bytecode, and not exit with error
Bytecode: `0x600556b2b75b6003b3`
| Pc | Op | Cost | Stack | RStack |
|-------|-------------|------|-----------|-----------|
| 0 | PUSH1 | 3 | [] | [] |
| 2 | JUMP | 8 | [5] | [] |
| 5 | JUMPDEST | 1 | [] | [] |
| 6 | PUSH1 | 3 | [] | [] |
2020-05-26 10:04:23 +02:00
| 8 | JUMPSUB | 8 | [3] | [] |
2020-04-16 13:30:18 +02:00
| 4 | RETURNSUB | 2 | [] | [ 8] |
| 9 | STOP | 0 | [] | [] |
2020-05-26 10:04:23 +02:00
Consumed gas: `25`
### Error on "walk-into-subroutine"
In this example, the code 'walks' into a subroutine, which is not allowed, and causes an error
| Pc | Op | Cost | Stack | RStack |
|-------|-------------|------|-----------|-----------|
| 0 | BEGINSUB | 1 | [] | [] |
```
Error: at pc=0, op=BEGINSUB: invalid subroutine entry
```
**Note 5**: The content of the error message, (`invalid subroutine entry` ) is implementation-specific.
2020-04-16 13:30:18 +02:00
2020-02-01 21:18:55 -07:00
## Implementations
2020-05-17 17:58:09 -06:00
Three clients have implemented the previous version of proposal:
2020-04-03 16:26:26 +02:00
- [geth ](https://github.com/ethereum/go-ethereum/pull/20619 ) .
2020-05-17 17:58:09 -06:00
- [besu ](https://github.com/hyperledger/besu/pull/717 ), and
- [openethereum ](https://github.com/openethereum/openethereum/pull/11629 ).
2020-02-01 21:18:55 -07:00
2020-05-17 17:58:09 -06:00
The changes for the current version are trivial.
2020-02-01 21:18:55 -07:00
2020-02-02 11:08:19 -07:00
### Costs and Codes
2020-05-26 10:04:23 +02:00
We suggest that the cost of `BEGINSUB` be _base_ , `JUMPSUB` be _mid_ , and `RETURNSUB` be _verylow_ .
2020-02-01 21:18:55 -07:00
Measurement will tell. We suggest the following opcodes:
2020-02-19 15:43:36 -07:00
2020-02-01 21:18:55 -07:00
```
2020-02-19 15:43:36 -07:00
0xb2 BEGINSUB
2020-02-02 11:08:19 -07:00
0xb3 JUMPSUB
0xb7 RETURNSUB
2020-02-01 21:18:55 -07:00
```
2020-05-19 15:42:07 +02:00
2020-05-26 10:04:23 +02:00
**Note 6**: Although specified at _base_ , the cost of `BEGINSUB` does not matter in practice, since `BEGINSUB` never executes without error.
2020-02-01 21:18:55 -07:00
## Security Considerations
2020-05-17 17:58:09 -06:00
These changes do introduce new flow control instructions, so any software which does static/dynamic analysis of evm-code
2020-05-19 15:35:46 -06:00
needs to be modified accordingly. The `JUMPSUB` semantics are similar to `JUMP` (but jumping to a `BEGINSUB` ), whereas the `RETURNSUB` instruction is different, since it can 'land' on any opcode (but the possible destinations can be statically inferred).
2020-04-03 16:26:26 +02:00
2020-02-19 15:43:36 -07:00
## Appendix: Comparative costs.
```
contract fun {
function test(uint x, uint y) public returns (uint) {
return test_mul(2,3);
}
function test_mul(uint x, uint y) public returns (uint) {
return multiply(x,y);
}
function multiply(uint x, uint y) public returns (uint) {
return x * y;
}
}
```
Here is solc 0.6.3 assembly code with labeled destinations.
```
2020-02-21 06:51:46 -07:00
TEST:
jumpdest
2020-02-19 15:43:36 -07:00
0x00
RTN
0x02
0x03
TEST_MUL
jump
2020-02-21 06:51:46 -07:00
TEST_MUL:
jumpdest
2020-02-19 15:43:36 -07:00
0x00
RTN
dup4
dup4
MULTIPLY
jump
2020-02-21 06:51:46 -07:00
RTN:
jumpdest
2020-02-19 15:43:36 -07:00
swap4
swap3
pop
pop
pop
jump
2020-02-21 06:51:46 -07:00
MULTIPLY:
jumpdest
2020-02-19 15:43:36 -07:00
mul
swap1
jump
```
solc does a good job with the multiply() function, which is a leaf. Non-leaf functions are more awkward to get out of. Calling `fun.test()` will cost _118 gas_ , plus 5 for the `mul` .
2020-05-17 17:58:09 -06:00
This is the same code written using `jumpsub` and `returnsub` . Calling `fun.test()` will cost _32 gas_ plus 5 for the `mul` .
2020-02-19 15:43:36 -07:00
```
2020-02-21 06:51:46 -07:00
TEST:
beginsub
2020-02-19 15:43:36 -07:00
0x02
0x03
TEST_MUL
jumpsub
returnsub
2020-02-21 06:51:46 -07:00
TEST_MUL:
beginsub
2020-02-19 15:43:36 -07:00
MULTIPLY
jumpsub
returnsub
2020-02-21 06:51:46 -07:00
MULTIPLY:
beginsub
2020-02-19 15:43:36 -07:00
mul
returnsub
```
2020-02-01 21:18:55 -07:00
2020-05-19 15:42:07 +02:00
**Copyright and related rights waived via [CC0 ](https://creativecommons.org/publicdomain/zero/1.0/ ).**