EIPs/EIPS/eip-2315.md
Greg Colvin 8ee164d9b7
Automatically merged updates to draft EIP(s) 2315 (#2646)
Hi, I'm a bot! This change was automatically merged because:

 - It only modifies existing Draft or Last Call EIP(s)
 - The PR was approved or written by at least one author of each modified EIP
 - The build is passing
2020-05-15 21:35:05 +12:00

9.1 KiB

eip title status type category author discussions-to created
2315 Simple Subroutines for the EVM Draft Standards Track Core Greg Colvin (greg@colvin.org), Martin Holst Swende (@holiman) https://ethereum-magicians.org/t/eip-2315-simple-subroutines-for-the-evm/3941 2019-10-17

Abstract

This proposal introduces three opcodes to support subroutines: BEGINSUB, JUMPSUB and RETURNSUB.

Motivation

The EVM does not provide subroutines as a primitive. Instead, calls can be synthesized by fetching and pushing the current program counter on the data stack and jumping to the subroutine address; returns can be synthesized by contriving to get the return address back to the top of stack and jumping back to it. Complex calling conventions are then needed to use the same stack for computation and control flow. Code becomes harder to read and write, and tools may need to pattern-match the conventions to identify the use of subroutines. Complex calling conventions like these can be avoided using memory, but regardless, it costs a lot of gas.

Having opcodes to directly support subroutines can eliminate this complexity and cost, just as for other physical and virtual machines going back at least 50 years.

In the Appendix we show example solc output for a simple program that uses over three times as much gas just calling and returning from subroutines as comparable code using these opcodes.

Specification

We introduce one more stack into the EVM, called return_stack. The return_stack is limited to 1023 items.

BEGINSUB

Marks the entry point to a subroutine. Subroutines can only be entered via JUMPSUB. Execution of BEGINSUB causes an exception (OOG: all gas consumed) and terminates execution.

pops: 0 pushes: 0

JUMPSUB
  1. Pops 1 value from the stack, hereafter referred to as location.
  • 1.1 If the opcode at location is not a BEGINSUB, abort with error.
  1. Pushes the current pc+1 to the return_stack. (See Note 1 below)
  • 2.1 If the return_stack already has 1023 items, abort with error.
  1. Sets the pc to location + 1.

Note 1: If the resulting pc is beyond the last instruction then the opcode is implicitly a STOP, which is not an error.

pops: 1 pushes: 0 (return_stack pushes: 1)

RETURNSUB
  1. Pops 1 value form the return_stack. 1.1 If the return_stack is empty, abort with error
  2. Sets pc to the popped value (See Note 1 above)

pops: 0 (return_stack pops: 1) pushes: 0

Note 2: Values popped from return_stack do not need to be validated, since they cannot be set arbitrarily from code, only implicitly by the evm. Note 3: A value popped from return_stack may be outside of the code length, if the last JUMPSUB was the last byte of the code. In this case the next opcode is implicitly a STOP, which is not an error.

Note 4: The description above lays out the semantics of this feature in terms of a return_stack. It's up to node implementations to decide the internal representation. For example, a node may decide to place PC on the return_stack at JUMPSUB, as long as the RETURNSUB correctly returns to the PC+1 location. The internals of the return_stack is not one of the "observable"/consensus-critical parts of the EVM.

Rationale

This is the smallest possible change that provides native subroutines without breaking backwards compatibility.

Backwards Compatibility

These changes do not affect the semantics of existing EVM code.

Alternative variants

One possible variant, would be to add an extra clause to the BEGINSUB opcode.

  • A BEGINSUB opcode may only be reached via a JUMPSUB.

This would make walking into a subroutine an error. The rationale for this would be to possibly improve static analysis, being able to make stronger assertions about the code flow.

This is not part of the current specification, since code-generators can trivially implement these guarantees by always prepending STOP opcode before any BEGINSUB operation.

Test Cases

Simple routine

This should jump into a subroutine, back out and stop.

Bytecode: 0x6004b300b2b7

Pc Op Cost Stack RStack
0 PUSH1 3 [] []
2 JUMPSUB 8 [4] []
4 BEGINSUB 1 [] [ 2]
5 RETURNSUB 2 [] [ 2]
3 STOP 0 [] []

Two levels of subroutines

This should execute fine, going into one two depths of subroutines Bytecode: 0x6800000000000000000cb300b26011b3b7b2b7

Pc Op Cost Stack RStack
0 PUSH9 3 [] []
10 JUMPSUB 8 [12] []
12 BEGINSUB 1 [] [10]
13 PUSH1 3 [] [10]
15 JUMPSUB 8 [17] [10]
17 BEGINSUB 1 [] [10,15]
18 RETURNSUB 2 [] [10,15]
16 RETURNSUB 2 [] [10]
11 STOP 0 [] []

Failure 1: invalid jump

This should fail, since the given location is outside of the code-range. The code is the same as previous example, except that the pushed location is 0x01000000000000000c instead of 0x0c.

Bytecode: 0x6801000000000000000cb300b26011b3b7b2b7

Pc Op Cost Stack RStack
0 PUSH9 3 [] []
10 JUMPSUB 8 [18446744073709551628] []
Error: at pc=10, op=JUMPSUB: evm: invalid jump destination

Failure 2: shallow return_stack

This should fail at first opcode, due to shallow return_stack

Bytecode: 0xb75858 (RETURNSUB, PC, PC)

Pc Op Cost Stack RStack
0 RETURNSUB 2 [] []
Error: at pc=0, op=RETURNSUB: evm: invalid retsub

Subroutine at end of code

In this example. the JUMPSUB is on the last byte of code. When the subroutine returns, it should hit the 'virtual stop' after the bytecode, and not exit with error

Bytecode: 0x600556b2b75b6003b3

Pc Op Cost Stack RStack
0 PUSH1 3 [] []
2 JUMP 8 [5] []
5 JUMPDEST 1 [] []
6 PUSH1 3 [] []
8 JUMPSUB 8 [3] []
3 BEGINSUB 1 [] [ 8]
4 RETURNSUB 2 [] [ 8]
9 STOP 0 [] []

Consumed gas: 26

Implementations

No clients have implemented this proposal as of yet, but there are Draft PRs for

Costs and Codes

We suggest that the cost of BEGINSUB be base, JUMPSUB be low, and RETURNSUB be verylow. Measurement will tell. We suggest the following opcodes:

0xb2 BEGINSUB
0xb3 JUMPSUB
0xb7 RETURNSUB

Security Considerations

These changes do introduce new flow control instructions, so any software which does static/dynamic analysis of evm-code needs to be modified accordingly. The JUMPSUB semantics are similar to JUMP (but jumping to a BEGINSUB), whereas the RETURNSUB instruction is different, since it can 'land' on any opcode (but the possible destinations can be statically inferred).

Appendix: Comparative costs.

contract fun {
    function test(uint x, uint y) public returns (uint) {
        return test_mul(2,3);
    }
    function test_mul(uint x, uint y) public returns (uint) {
        return multiply(x,y);
    }
    function multiply(uint x, uint y) public returns (uint) {
        return x * y;
    }
}

Here is solc 0.6.3 assembly code with labeled destinations.

TEST:
     jumpdest
     0x00
     RTN
     0x02
     0x03
     TEST_MUL
     jump
TEST_MUL:
     jumpdest
     0x00
     RTN
     dup4
     dup4
     MULTIPLY
     jump
RTN:
     jumpdest
     swap4
     swap3
     pop
     pop
     pop
     jump
MULTIPLY:   
     jumpdest
     mul
     swap1
     jump

solc does a good job with the multiply() function, which is a leaf. Non-leaf functions are more awkward to get out of. Calling fun.test() will cost 118 gas, plus 5 for the mul.

This is the same code written using jumpsub and returnsub. Calling fun.test() will cost 34 gas (plus 5).

TEST:
     beginsub
     0x02
     0x03
     TEST_MUL
     jumpsub
     returnsub
TEST_MUL:
     beginsub
     MULTIPLY
     jumpsub
     returnsub
MULTIPLY:
     beginsub
     mul
     returnsub

Copyright and related rights waived via CC0.