9.1 KiB
eip | title | status | type | category | author | discussions-to | created |
---|---|---|---|---|---|---|---|
2315 | Simple Subroutines for the EVM | Draft | Standards Track | Core | Greg Colvin (greg@colvin.org), Martin Holst Swende (@holiman) | https://ethereum-magicians.org/t/eip-2315-simple-subroutines-for-the-evm/3941 | 2019-10-17 |
Abstract
This proposal introduces three opcodes to support subroutines: BEGINSUB
, JUMPSUB
and RETURNSUB
.
Motivation
The EVM does not provide subroutines as a primitive. Instead, calls can be synthesized by fetching and pushing the current program counter on the data stack and jumping to the subroutine address; returns can be synthesized by contriving to get the return address back to the top of stack and jumping back to it. Complex calling conventions are then needed to use the same stack for computation and control flow. Memory allows for simpler conventions but still costs gas. Eschewing subroutines in user code is the least costly -- but also the most failure-prone.
Over the course of 30 years the computer industry struggled with this complexity and cost and settled in on opcodes to directly support subroutines. These are provided in some form by most all physical and virtual machines going back at least 50 years.
Our design is modeled on the original Forth two-stack machine of 1970. The data stack is supplemented with a return stack to provide simple support for subroutines, as specified below.
In the Appendix we show example solc output for a simple program that uses over three times as much gas just calling and returning from subroutines as comparable code using these opcodes. Actual differences in run-time efficiency will of course vary widely.
Specification
We introduce one more stack into the EVM in addition to the existing data stack
which we call the return stack
. The return stack
is limited to 1023
items.
BEGINSUB
Marks the entry point to a subroutine. Attempted execution of a BEGINSUB
causes an abort
: terminate execution with an OOG
(Out Of Gas) exception.
JUMPSUB
Transfers control to a subroutine.
- Pop the
location
off thedata stack
. - If the opcode at
location
is not aBEGINSUB
abort
. - If the
return stack
already has1023
itemsabort
. - Push the current
pc + 1
to thereturn stack
. - Set
pc
tolocation + 1
.
- pops one item off the
data stack
- pushes one item on the
return stack
RETURNSUB
Returns control to the caller of a subroutine.
- If the
return stack
is emptyabort
. - Pop
pc
off thereturn stack
.
- pops one item off the
return stack
Note 1: If a resulting pc
to be executed is beyond the last instruction then the opcode is implicitly a STOP
, which is not an error.
Note 2: Values popped off the return stack
do not need to be validated, since they are alterable only by JUMPSUB
and RETURNSUB
.
Note 3: The description above lays out the semantics of this feature in terms of a return stack
. But the actual state of the return stack
is not observable by EVM code or consensus-critical to the protocol. (For example, a node implementor may code JUMPSUB
to unobservably push pc
on the return stack
rather than pc + 1
, which is allowed so long as RETURNSUB
observably returns control to the pc + 1
location.)
Rationale
This is the is a small change that provides native subroutines without breaking backwards compatibility.
Backwards Compatibility
These changes do not affect the semantics of existing EVM code.
Test Cases
Simple routine
This should jump into a subroutine, back out and stop.
Bytecode: 0x6004b300b2b7
Pc | Op | Cost | Stack | RStack |
---|---|---|---|---|
0 | PUSH1 | 3 | [] | [] |
2 | JUMPSUB | 8 | [4] | [] |
5 | RETURNSUB | 2 | [] | [ 2] |
3 | STOP | 0 | [] | [] |
Two levels of subroutines
This should execute fine, going into one two depths of subroutines
Bytecode: 0x6800000000000000000cb300b26011b3b7b2b7
Pc | Op | Cost | Stack | RStack |
---|---|---|---|---|
0 | PUSH9 | 3 | [] | [] |
10 | JUMPSUB | 8 | [12] | [] |
13 | PUSH1 | 3 | [] | [10] |
15 | JUMPSUB | 8 | [17] | [10] |
18 | RETURNSUB | 2 | [] | [10,15] |
16 | RETURNSUB | 2 | [] | [10] |
11 | STOP | 0 | [] | [] |
Failure 1: invalid jump
This should fail, since the given location
is outside of the code-range. The code is the same as previous example,
except that the pushed location
is 0x01000000000000000c
instead of 0x0c
.
Bytecode: 0x6801000000000000000cb300b26011b3b7b2b7
Pc | Op | Cost | Stack | RStack |
---|---|---|---|---|
0 | PUSH9 | 3 | [] | [] |
10 | JUMPSUB | 8 | [18446744073709551628] | [] |
Error: at pc=10, op=JUMPSUB: evm: invalid jump destination
Failure 2: shallow return stack
This should fail at first opcode, due to shallow return stack
Bytecode: 0xb75858
(RETURNSUB
, PC
, PC
)
Pc | Op | Cost | Stack | RStack |
---|---|---|---|---|
0 | RETURNSUB | 2 | [] | [] |
Error: at pc=0, op=RETURNSUB: evm: invalid retsub
Subroutine at end of code
In this example. the JUMPSUB is on the last byte of code. When the subroutine returns, it should hit the 'virtual stop' after the bytecode, and not exit with error
Bytecode: 0x600556b2b75b6003b3
Pc | Op | Cost | Stack | RStack |
---|---|---|---|---|
0 | PUSH1 | 3 | [] | [] |
2 | JUMP | 8 | [5] | [] |
5 | JUMPDEST | 1 | [] | [] |
6 | PUSH1 | 3 | [] | [] |
8 | JUMPSUB | 8 | [3] | [] |
4 | RETURNSUB | 2 | [] | [ 8] |
9 | STOP | 0 | [] | [] |
Consumed gas: 25
Error on "walk-into-subroutine"
In this example, the code 'walks' into a subroutine, which is not allowed, and causes an error
Pc | Op | Cost | Stack | RStack |
---|---|---|---|---|
0 | BEGINSUB | 1 | [] | [] |
Error: at pc=0, op=BEGINSUB: invalid subroutine entry
Note 5: The content of the error message, (invalid subroutine entry
) is implementation-specific.
Implementations
Three clients have implemented the previous version of proposal:
- geth .
- besu, and
- openethereum.
The changes for the current version are trivial.
Costs and Codes
We suggest that the cost of BEGINSUB
be base, JUMPSUB
be mid, and RETURNSUB
be verylow.
Measurement will tell. We suggest the following opcodes:
0xb2 BEGINSUB
0xb3 JUMPSUB
0xb7 RETURNSUB
Note 6: Although specified at base, the cost of BEGINSUB
does not matter in practice, since BEGINSUB
never executes without error.
Security Considerations
These changes do introduce new flow control instructions, so any software which does static/dynamic analysis of evm-code
needs to be modified accordingly. The JUMPSUB
semantics are similar to JUMP
(but jumping to a BEGINSUB
), whereas the RETURNSUB
instruction is different, since it can 'land' on any opcode (but the possible destinations can be statically inferred).
Appendix: Comparative costs.
contract fun {
function test(uint x, uint y) public returns (uint) {
return test_mul(2,3);
}
function test_mul(uint x, uint y) public returns (uint) {
return multiply(x,y);
}
function multiply(uint x, uint y) public returns (uint) {
return x * y;
}
}
Here is solc 0.6.3 assembly code with labeled destinations.
TEST:
jumpdest
0x00
RTN
0x02
0x03
TEST_MUL
jump
TEST_MUL:
jumpdest
0x00
RTN
dup4
dup4
MULTIPLY
jump
RTN:
jumpdest
swap4
swap3
pop
pop
pop
jump
MULTIPLY:
jumpdest
mul
swap1
jump
solc does a good job with the multiply() function, which is a leaf. Non-leaf functions are more awkward to get out of. Calling fun.test()
will cost 118 gas, plus 5 for the mul
.
This is the same code written using jumpsub
and returnsub
. Calling fun.test()
will cost 32 gas plus 5 for the mul
.
TEST:
beginsub
0x02
0x03
TEST_MUL
jumpsub
returnsub
TEST_MUL:
beginsub
MULTIPLY
jumpsub
returnsub
MULTIPLY:
beginsub
mul
returnsub
Copyright and related rights waived via CC0.