EIPs/EIPS/eip-2315.md
Greg Colvin f682474309
Update eip-2315.md (#4215)
* Update eip-2315.md

* Update eip-2315.md
2021-09-24 23:19:03 +00:00

19 KiB
Raw Blame History

eip title status type category author discussions-to created requires
2315 Simple Subroutines for the EVM Draft Standards Track Core Greg Colvin <greg@colvin.org>, Martin Holst Swende (@holiman), Brooklyn Zelenka (@expede) https://ethereum-magicians.org/t/eip-2315-simple-subroutines-for-the-evm/3941 2019-10-17 3540, 3670

Abstract

This proposal introduces four opcodes to support simple subroutines and relative jumps: JUMPSUB, RETURNSUB, RJUMP and RJUMPI.

These provide a complete static control-flow facility that supports substantial reductions in the complexity and the gas costs of calling and optimizing simple subroutines from %33 to as much as 52% savings in gas.

Motivation

The EVM does not provide subroutines as a primitive. Instead, calls can be synthesized by fetching and pushing the current program counter on the data stack and jumping to the subroutine address; returns can be synthesized by getting the return address to the top of the stack and jumping back to it. These conventions create unnecessary cost and complexity that is borne by the humans and programs writing, reading, and analyzing EVM code,

Facilities to directly support subroutines are provided by all but one of the real and virtual machines programmed by the lead author, including the Burroughs 5000, CDC 7600, IBM 360, DEC PDP 11 and VAX, Motorola 68000, a few generations of Intel silicon, Sun SPARC, UCSD p-Machine, Sun JVM, Wasm, and the sole exception -- the EVM. In whatever form, these operations provide for

  • capturing the current context of execution,
  • transferring control to a new context, and
  • returning to the original context
    • after possible further transfers of control
    • to some arbitrary depth.

The concept goes back to Turing, 1946:

We also wish to be able to arrange for the splitting up of operations into subsidiary operations. This should be done in such a way that once we have written down how an operation is done we can use it as a subsidiary to any other operation. ... When we wish to start on a subsidiary operation we need only make a note of where we left off the major operation and then apply the first instruction of the subsidiary. When the subsidiary is over we look up the note and continue with the major operation. Each subsidiary operation can end with instructions for this recovery of the note. How is the burying and disinterring of the note to be done? There are of course many ways. One is to keep a list of these notes in one or more standard size delay lines, (1024) with the most recent last. The position of the most recent of these will be kept in a fixed TS, and this reference will be modified every time a subsidiary is started or finished...

We propose to follow Turing's simple concept in our subroutine design, as specified below.

Gas Cost Analysis

We show here how these opcodes can be used to reduce the gas costs of both ordinary subroutine calls and low-level optimizations. The savings reported here will of course be less relevant to programs that use a few large subroutines rather than being a factored than into smaller ones. The choices of gas costs for the new opcodes below do not make a large difference in this analysis, as much of the improvement is due to PUSH and SWAP operations that are no longer needed. Even if JUMPSUB cost the same as JUMP 8 gas rather than 5 - a simple subroutine call would still be 48% less costly versus 52%.

Simple Subroutine Call

Consider this example of calling a fairly minimal subroutine using JUMPSUB

Subroutine call, using JUMPSUB

TEST_SQUARE:
    jumpdest        ; 1 gas
    0x02            ; 3 gas
    jumpsub SQUARE  ; 5 gas
    returnsub       ; 3 gas

SQUARE:
    jumpdest        ; 1 gas
    dup1            ; 3 gas
    mul             ; 5 gas
    returnsub       ; 3 gas

Total 24 gas.

Subroutine call, using JUMP

TEST_SQUARE:
    jumpdest        ; 1 gas
    RTN_SQUARE      ; 3 gas
    0x02            ; 3 gas
    SQUARE          ; 3 gas
    jump            ; 8 gas
RTN_SQUARE:
    jumpdest        ; 1 gas
    swap1           ; 3 gas
    jump            ; 8 gas

SQUARE:
    jumpdest        ; 1 gas
    dup1            ; 3 gas
    mul             ; 5 gas
    swap1           ; 3 gas
    jump            ; 8 gas

Total: 50 gas

Using JUMPSUB saves 50 - 24 = 26 gas versus using JUMP a 52% performance improvement.

Tail Call Optimization

Of course in cases like this one we can optimize the tail call, so that the return from SQUARE actually returns from TEST_SQUARE.

Tail call optimization, using RJUMP and RETURNSUB.

TEST_SQUARE:
    jumpdest        ; 1 gas
    0x02            ; 3 gas
    rjump SQUARE    ; 3 gas

SQUARE:
    jumpdest        ; 1 gas
    dup1            ; 3 gas
    mul             ; 5 gas
    returnsub       ; 3 gas

Total: 19 gas

Tail call optimization, using JUMP

TEST_SQUARE:
    jumpdest        ; 1 gas
    0x02            ; 3 gas
    SQUARE          ; 3 gas
    jump            ; 8 gas

SQUARE:
    jumpdest        ; 1 gas
    dup1            ; 3 gas
    mul             ; 5 gas
    swap1           ; 3 gas
    jump            ; 8 gas

Total: 35 gas

Using JUMPSUB versus JUMP saves 35 - 19 = 16 gas a 46% performance improvement.

So we can see that these instructions provide a simpler and more efficient subroutine mechanism than dynamic jumps.

Tail Call Elimination

We can even take advantage of SQUARE just happening to directly follow TEST_SQUARE and just fall through rather than jump at all.

Tail call elimination, using JUMPSUB.

TEST_SQUARE:
    jumpdest        ; 1 gas
    0x02            ; 3 gas
SQUARE:
    jumpdest        ; 1 gas
    dup1            ; 3 gas
    mul             ; 5 gas
    returnsub       ; 3 gas

Total 16 gas.

Tail call elimination, using JUMP.

TEST_SQUARE:
    jumpdest        ; 1 gas
    0x02            ; 3 gas
SQUARE:
    jumpdest        ; 1 gas
    dup1            ; 3 gas
    mul             ; 5 gas
    swap1           ; 3 gas
    jump            ; 8 gas

Total: 24 gas

Using RETURNSUB versus JUMP saves 24 - 16 = 8 gas a 33% performance improvement.

We can also consider the alternative subroutine call, using a version of JUMPSUB that pushes its return address on the stack.

TEST_SQUARE:
    jumpdest        ; 1 gas
    0x02            ; 3 gas
    jumpsub SQUARE  ; 5 gas
    swap1           ; 3 gas
    returnsub       ; 3 gas

SQUARE:
    jumpdest        ; 1 gas
    dup1            ; 3 gas
    mul             ; 5 gas
    swap1           ; 3 gas
    returnsub       ; 3 gas

Total 31 gas, compared to 24 gas for the return stack version.

Note that this specification is entirely semantic. It constrains only data usage and control flow and imposes no syntax on code beyond being a sequence of bytes to be executed.

Specification

We introduce one more stack into the EVM in addition to the existing data stack, which we call the return stack. The return stack is limited to 1024 items. This stack supports three new instructions for subroutines.

Instructions

JUMPSUB (0x5e) location

Transfers control to a subroutine.

  1. Decode the location from the immediate data. The data is encoded as three bytes, MSB-first.
  2. Set PC to location.

The cost is low.

  • pops one item off the data stack
  • pushes one item on the return stack

RETURNSUB (0x5f)

Returns control to the caller of a subroutine.

  1. Pop PC off the return stack.

The cost is verylow.

  • pops one item off the return stack

To provide a complete set of control structures, and to take full advantage of the performance benefits of simple subroutines we also provide two static, relative jump functions that take their arguments as immediate data rather then off the stack.

RJUMP (0x5c) offset

Transfers control to the address PC + offset, where offset is a three-byte, MSB first, twos-complement integer.

  1. Decode the offset from the immediate data. The data is encoded as three bytes, MSB first, twos-complement.
  2. Set PC to location.

The cost is low.

RJUMPI (0x5d) offset

Conditionally transfers control to the address PC + offset, where offset is a three-byte, MSB first, twos-complement integer.

  1. Decode the offset from the immediate data. The data is encoded as three bytes, MSB first, twos-complement.
  2. Pop the condition from the stack.
  3. If the condition is true then continue
  4. Set PC to PC + offset.

The cost is mid.

Notes:

  • If a resulting PC to be executed is beyond the last instruction then the opcode is implicitly a STOP, which is not an error.
  • Values popped off the return stack do not need to be validated, since they are alterable only by JUMPSUB and RETURNSUB.
  • The description above lays out the semantics of this feature in terms of a return stack. But the actual state of the return stack is not observable by EVM code or consensus-critical to the protocol. (For example, a node implementer may code JUMPSUB to unobservably push PC on the return stack rather than PC + 1, which is allowed so long as RETURNSUB observably returns control to the PC + 1 location.)
  • The return stack is the functional equivalent of Turing's "delay line".

JUMP and JUMPI are assigned mid and high gas fees -- they require operations on 256-bit stack items and checking for valid destinations. None of these operations require checking, and only RJUMPI requires 256-bit arithmetic. So the low cost of JUMPSUB versus is justified by needing only about six Go operations to push the return address on the return stack and decode the immediate two byte destination to the PC and the verylow cost of RETURNSUB is justified by needing only about three Go operations to pop the return stack into the PC. Similarly, the low cost of RJUMP is justified by needing even less work than JUMPSUB, and the cost RJUMPI is mid because of the extra work to test the conditional. Benchmarking will be needed to tell if the costs are well-balanced.

Validity

Attempts to create contracts that can be proven to be invalid will fail.

Invalid contracts will have one or more

  • invalid instructions,
  • invalid jump destinations,
  • stack underflows, or
  • stack overflows without recursion.

Because of dynamic JUMP and JUMPI instructions the rules below give necessary but not sufficient conditions for validity. So long as we have unrestricted dynamic jumps we cannot prove contracts to be valid, only invalid. (See EIP-3779 for further discussion of validation and a means of restricting dynamic jumps.)

Validation Rules

This section extends the contact creation validation rules (as defined in EIP-3540 and EIP-3670.)

  1. Every RJUMP and RJUMPI addresses a valid JUMPDEST.
  2. The stack depth is
    • always positive and
    • the same on every path through an opcode.
  3. The stack pointer is always positive and at most 1024.

The Yellow Paper has the stack pointer (SP) pointing just past the top item on the data stack. We define the stack base (BP)as the element that the SP addressed at the entry to the current basic block, or 0 on program entry, and the stack depth as the number of stack elements between the current SP and the current BP.

Taken together, these rules allow for code to be (in-)validated by traversing the control-flow graph, following each edge only once.

Rationale

We modeled this design on Moore's 1970 Forth virtual machine. It is a simple two-stack design the data stack is supplemented with a return stack to support jumping to and returning from subroutines, as specified above, and as conceptualized by Turing. The return address (Turing's "note") is pushed onto the return stack (Turing's "delay line") when calling, and the return address is popped into the PC when returning.

The alternative design is to push the return address and the destination address on the data stack before jumping to the subroutine, and to later jump back to the return address on the stack in order to return. This is the current approach. It could be streamlined to some extent by having JUMPSUB push the return address for RETURNSUB to pop.

We prefer the separate return stack because it maintains a clear separation between data and flow of control. This ensures that the return address cannot be overwritten or mislaid. It also reduces costs by using fewer data stack slots and moving less data.

Backwards Compatibility

These changes do not affect the semantics of existing EVM code. These changes are compatible with the restricted forms of JUMP and JUMPI specified by EIP-3779 -- contracts following all of the rules given there and here will be valid.

Test Cases

Simple routine

This should jump into a subroutine, back out and stop.

Bytecode: 0x60045e005b5d (PUSH1 0x04, JUMPSUB, STOP, JUMPDEST, RETURNSUB)

Pc Op Cost Stack RStack
0 JUMPSUB 5 [] []
3 RETURNSUB 5 [] [0]
4 STOP 0 [] []

Output: 0x Consumed gas: 10

Two levels of subroutines

This should execute fine, going into one two depths of subroutines

Bytecode: 0x6800000000000000000c5e005b60115e5d5b5d (PUSH9 0x00000000000000000c, JUMPSUB, STOP, JUMPDEST, PUSH1 0x11, JUMPSUB, RETURNSUB, JUMPDEST, RETURNSUB)

Pc Op Cost Stack RStack
0 JUMPSUB 5 [] []
3 JUMPSUB 5 [] [0]
4 RETURNSUB 5 [] [0,3]
5 RETURNSUB 5 [] [3]
6 STOP 0 [] []

Consumed gas: 20

Failure 1: invalid jump

This should fail, since the given location is outside of the code-range. The code is the same as previous example, except that the pushed location is 0x01000000000000000c instead of 0x0c.

Bytecode: (PUSH9 0x01000000000000000c, JUMPSUB, 0x6801000000000000000c5e005b60115e5d5b5d, STOP, JUMPDEST, PUSH1 0x11, JUMPSUB, RETURNSUB, JUMPDEST, RETURNSUB)

Pc Op Cost Stack RStack
0 JUMPSUB 10 [18446744073709551628] []
Error: at pc=10, op=JUMPSUB: invalid jump destination

Failure 2: shallow return stack

This should fail at first opcode, due to shallow return_stack

Bytecode: 0x5d5858 (RETURNSUB, PC, PC)

Pc Op Cost Stack RStack
0 RETURNSUB 5 [] []
Error: at pc=0, op=RETURNSUB: invalid retsub

Subroutine at end of code

In this example. the JUMPSUB is on the last byte of code. When the subroutine returns, it should hit the 'virtual stop' after the bytecode, and not exit with error

Bytecode: 0x6005565b5d5b60035e (PUSH1 0x05, JUMP, JUMPDEST, RETURNSUB, JUMPDEST, PUSH1 0x03, JUMPSUB)

Pc Op Cost Stack RStack
0 PUSH1 3 [] []
2 JUMP 8 [5] []
5 JUMPDEST 1 [] []
6 JUMPSUB 5 [] []
2 RETURNSUB 5 [] [2]
7 STOP 0 [] []

Consumed gas: 30

Validation Algorithm

This section specifies an algorithm for checking the above the rules. Equivalent code must be run at creation time. We assume that the validation defined in EIP-3540 and EIP-3670 has already run, although in practice the algorithms can be merged.

The following is a pseudo-Go implementation of an algorithm for enforcing adherence to the above rules. This algorithm is a symbolic execution of the program that recursively traverses the bytecode, following its control flow and stack use and checking for violations of the rules above. It uses a stack to track the slots that hold PUSHed constants, from which it pops the destinations to validate during the analysis.

This algorithm runs in time equal to O(vertices + edges) in the program's control-flow graph, where edges represent control flow and the vertices represent basic blocks thus the algorithm takes time proportional to the size of the bytecode.

For simplicity's sake we assume a few helper functions.

  • advance_pc() advances the PC, skipping any immediate data.
  • imm_data() returns immediate data for an instruction.`J
  • valid_jumpdest() checks that a jump destination is not in immediate data.
  • remove_items() returns the number of items removed from the stack by an instruction
  • add_items() returns the number of items added to the stack. Items are added as 0xFFFFFFFF. The PC, PUSH…, SWAP…, DUP…, JUMP, and JUMPI` instructions are handled separately.
var code  [code_len]byte
var depth [code_len]unsigned
var sp := 1023            
var bp := 1023
    
func validate(pc := 0, depth := 0) boolean {

   for ; pc < code_len; pc = advance_pc(pc) {
      
      // successful termination
      switch instruction {
      case STOP    return true
      case RETURN  return true
      case SUICIDE return true
      }

      // check for stack underflow and overflow
      depth := bp - sp
      if depth < 0 || sp < 0 || 1024 < sp {
         return false
      }

      // if stack depth for `pc` is non-zero we have been here before 
      // so return to break cycle in control flow graph
      if depth[pc] != 0 {
          return true
      }
      depth[pc] = depth

      if (instruction == RJUMP) {

         // check for valid destination
         jumpdest = pc + imm_data(pc)
         if !valid_jumpdest(jumpdest) {
            return false
         }

         // will enter basic block at destination
         bp = sp

         // reset pc to destination of jump 
         pc = jumpdest
         continue
      }
      if (instruction == RJUMPI {

         // check for valid destination
         jumpdest = pc + imm_data(pc)
         if !valid_jumpdest(dest) {
            return false
         }

         // will enter basic block at destination or next instruction
         bp = sp

         // recurse to jump to code to validate
         if !validate(dest) {
            return false
         }
       
        // false side of conditional -- continue to next instruction
         pc++
         continue
      }

      // apply other instructions to stack
      sp += remove_items(pc)
      sp -= add_items(pc)
   }
   
   // successful termination
   return true
}

Security Considerations

These changes do introduce new flow control instructions, so any software which does static/dynamic analysis of EVM code needs to be modified accordingly. The JUMPSUB semantics are similar to JUMP whereas the RETURNSUB instruction is different, since it can 'land' on any opcode (but the possible destinations can be statically inferred).

Copyright and related rights waived via CC0.