JUMPSUB improvements. (#3743)

* JUMPSUB improvements. * typo Co-authored-by: Greg Colvin <gcolvin@Tinman.local>
2025-02-23 04:08:09 +00:00 · 2021-08-19 16:34:08 -04:00 · 2021-08-19 16:34:08 -04:00 · eeec6440e5
commit eeec6440e5
parent b7d7eaa999
1 changed files with 82 additions and 51 deletions
--- a/EIPS/eip-2315.md
+++ b/EIPS/eip-2315.md
@ -11,7 +11,7 @@ created: 2019-10-17

 ## Simple Summary

-(Almost) the smallest possible change that provides native subroutines without breaking backwards compatibility.
+(Almost) the smallest possible change that provides native subroutines.

 ## Abstract

@ -19,21 +19,22 @@ This proposal introduces three opcodes to support subroutines: `BEGINSUB`, `JUMP

 Substantial gains in efficiency are achieved.

-Safety properties equivalent to  [EIP-615](https://eips.ethereum.org/EIPS/eip-615) can be ensured by enforcing a few simple rules, which can be validated with the provided algorithm, and without imposing syntactic constraints.
+Safety properties equivalent to  [EIP-615](https://eips.ethereum.org/EIPS/eip-615) can be ensured by enforcing a few simple rules, which are validated with the provided algorithm, and without imposing syntactic constraints.

 ## Motivation

-The EVM does not provide subroutines as a primitive.  Instead, calls can be synthesized by fetching and pushing the current program counter on the data stack and jumping to the subroutine address; returns can be synthesized by getting the return address to the top of the stack and jumping back to it.  In the EVM the return 
+The EVM does not provide subroutines as a primitive.  Instead, calls can be synthesized by fetching and pushing the current program counter on the data stack and jumping to the subroutine address; returns can be synthesized by getting the return address to the top of the stack and jumping back to it.  These conventions are more costly than necessary, and impede static analysis of EVM code, especically rapid validation of some important safety properties.

-Facilities to directly support subroutines are provided by all but one of the machines programmed by the lead author, including the B5000, CDC7600, IBM360, PDP8, PDP11, VAX, M68000, 80x86, SPARC, p-Machine, JVM and EVM.  In whatever form, these operations provide for capturing the current context of execution, transferring control to a new context, and returning to the original context.  The concept goes back to Turing:
+Facilities to directly support subroutines are provided by all but one of the machines programmed by the lead author, including the Burroughs 5000, CDC 7600, IBM 360, DEC PDP 11 and VAX, Motorola 68000, Intel 80x86, Sun SPARC, USCD p-Machine, Sun JVM, Wasm, and the EVM.  In whatever form, these operations provide for capturing the current context of execution, transferring control to a new context, and returning to the original context.  The concept goes back to Turing (1946):

 > We also wish to be able to arrange for the splitting up of operations into subsidiary operations.  This should be done in such a way that once we have written down how an operation is done we can use it as a subsidiary to any other operation.
+> 
 > ...
-> When we wish to start on a subsidiary operation we need only make a note of where we left off the major operation and then apply the first instruction of the subsidiary.  When the subsidiary is over we look up the note and continue with the major operation. Each subsidiary operation can end with instructions for this recovery of the note.  How is the burying and disinterring of the note to be done?  There are of course many ways.  One is to keep a list of these notes in one or more standard size delay lines, with the most recent last.  The position of the most recent of these will be kept in a fixed TS, and this reference will be modified every time a subsidiary is started or finished.  The burying and disinterring processes are fairly elaborate, but there is fortunately no need to repeat the instructions involved, each time, the burying being done through a standard instruction table BURY, and the disinterring by the table UNBURY.
+> When we wish to start on a subsidiary operation we need only make a note of where we left off the major operation and then apply the first instruction of the subsidiary.  When the subsidiary is over we look up the note and continue with the major operation. Each subsidiary operation can end with instructions for this recovery of the note.  How is the burying and disinterring of the note to be done?  There are of course many ways.  One is to keep a list of these notes in one or more standard size delay lines, (1024) with the most recent last.  The position of the most recent of these will be kept in a fixed TS, and this reference will be modified every time a subsidiary is started or finished...
 >
-> Notes: TS 1 contains the address of the currently executing instruction. "minor cycle" = word.
+> Notes: TS 1 contains the address of the currently executing instruction. 

-We propose to use Turing's simple _return-stack_ mechanism, long known to work well for virtual stack machines, which we specify here.  Note that this specification is entirely semantic.  It constrains only stack usage and control flow and imposes no syntax on code beyond being a sequence of bytes to be executed.
+We propose to use Turing's simple mechanism, long known to work well for virtual stack machines, which we specify here.  Note that this specification is entirely semantic.  It constrains only stack usage and control flow and imposes no syntax on code beyond being a sequence of bytes to be executed.

 ## Specification

@ -41,13 +42,13 @@ We introduce one more stack into the EVM in addition to the existing `data stack

 ### `BEGINSUB`

-Marks the entry point to a subroutine.  Execution of a `BEGINSUB` is  a no-op.
+Marks an entry point to a subroutine.  Execution of a `BEGINSUB` is a no-op.

-#### `JUMPSUB <immediate data>`
+#### `JUMPSUB <location>`

 Transfers control to a subroutine.

-1. Decode the `location` from the `immediate data`.  The data is encoded as two bytes MSB-first bytes.
+1. Decode the `location` from the immediate data.  The data is encoded as two bytes, MSB-first.
 2. If the opcode at `location` is not a `BEGINSUB` _`abort`_.
 3. If the `return stack` already has `1024` items _`abort`_.
 4. Push the current `pc + 1` to the `return stack`.
@ -77,15 +78,14 @@ _Note 3: The description above lays out the semantics of this feature in terms o

 [EOF static relative jumps](https://github.com/ethereum/evmone/pull/351)

-
 ## Rationale

 We modeled this design on the simple, proven, archetypal Forth virtual machine of 1970.  It is a two-stack design -- the data stack is supplemented with Turing's return stack to support jumping into and returning from subroutines, as specified above.

-The separate return stack ensures that the return address cannot be overwritten or mislaid, and obviates any need to swap the return address past the arguments on the stack.  Importantly, a dynamic jump is not needed to implement subroutine returns, allowing for deprecation of dynamic uses of `JUMP` and `JUMPI`.
+The separate return stack ensures that the return address cannot be overwritten or mislaid, and obviates any need to swap the return address past the arguments on the stack.  Importantly, a dynamic jump is not needed to implement subroutine returns, allowing for deprecation of `JUMP` and `JUMPI`.
+
+(`JUMPSUB` and `RETURNSUB` are also defined in terms of a `return stack` in [EIP-615](https://eips.ethereum.org/EIPS/eip-615)).

-(`JUMPSUB` and `RETURNSUB` are also defined in terms of a `return stack` in [EIP-615](https://eips.ethereum.org/EIPS/eip-615))
-.
 ## Backwards and Forwards Compatibility

 These changes affect the semantics of existing EVM code.  The EVM Object Format is required to allow for immediate data.
@ -103,17 +103,15 @@ Three clients have implemented this (or an earlier version of this) proposal:
 ### Costs and Codes

 We suggest that the cost of 
-
 - `BEGINSUB` be _jumpdest_ (`1`)
 - `JUMPSUB` be _low_ (`5`)
- `RETURNSUB` be _low_ (`5`).
+- `RETURNSUB` be _low_ (`5`)

-The _low_ costs are justified by these instructions requiring only a few operations on the return stack and `pc`, with no 256-bit arithmetic.
+The _low_ costs are justified by these instructions requiring only a few operations on the return stack and `pc`, with no 256-bit arithmetic and no checking for valid destinations.

 Benchmarking might be needed to tell if the costs are well-balanced. 

 We suggest the following opcodes:
-
 ```
 0x5c BEGINSUB
 0x5d RETURNSUB
@ -139,7 +137,6 @@ DIVIDE:
 Total 28 gas.

 The same code, using JUMP.
-
 ```
 TEST_DIV:
   jumpdest         ; 1 gas
@ -161,9 +158,9 @@ DIVIDE:
 ```
 50 gas total.

-Both approaches need to push two arguments and divide = 11 gas, so control flow gas is 39 using `JUMP` versus 17 using `JUMPSUB`.
+Both approaches need to push two arguments and divide, using 11 gas.  So control flow gas is 39 using `JUMP` versus 17 using `JUMPSUB`.

-That’s a savings of 22 gas.
+That’s a 56% savings of 22 gas to just jump to and return from a subroutine.

 In the general case of one routine calling another I don’t think the `JUMP` version can do better. Of course in this case we can optimize the tail call, so that the final jump in `DIVIDE` actually returns from TEST_DIV.
 ```
@ -197,6 +194,8 @@ DIVIDE:
 Total 24 gas, better than `JUMPSUB`.

 However, `JUMPSUB` can do even better with the same optimizations.
+
+We can optimize the tail call.
 ```
 TEST_DIV:
    beginsub        ; 1 gas
@ -210,7 +209,9 @@ DIVIDE:
    div             ; 5 gas
    returnsub       ; 5 gase
 ```
-Total 30 gas.  5 better than with `JUMP`,
+Total 30 gas.  5 better than with `JUMP`.
+
+And we can fall directly into the "called" routine.

 ```
 TEST_DIV:
@ -224,6 +225,8 @@ DIVIDE:
 ```
 Total 18 gas.  6 better than with `JUMP`.

+So these opcodes both improve simple subroutine calls, and increase the efficiency of available optimizations.
+
 ## Security Considerations

 These changes do introduce new flow control instructions, so any software which does static/dynamic analysis of evm-code needs to be modified accordingly. The `JUMPSUB` semantics are similar to `JUMP` (but jumping to a `BEGINSUB`), whereas the `RETURNSUB` instruction is different, since it can 'land' on any opcode (but the possible destinations can be statically inferred).
@ -255,17 +258,17 @@ We would like to consider EVM code valid iff no execution of the program can lea

 Rule 0, depracating `JUMP` and `JUMPI`, would forbid dynamic jumps.  Absent dynamic jumps another mechanism is needed for subroutine returns, as provided here. 

-Jump destinations are currently checked at runtime.  Static jumps allow them to be validated at creation time, per rule 1.  _Note: Valid instructions are not part of PUSH data._
+Jump destinations are currently checked at runtime.  Static jumps allow them to be validated at creation time, per rule 1.  _Note: Valid instructions are not part of immediate data._

 For rules 3 and 4 we need to define `stack depth`.  The Yellow Paper has the `stack pointer` or `SP` pointing just past the top item on the `data stack`.   We define the `stack base` as where the `SP` pointed before the most recent `JUMPSUB`, or `0` on program entry.  So we can define the `stack depth` as the number of stack elements between the current `SP` and the current `stack base`.  

 Given our definition of `stack depth` Rule 3 ensures that control flows which return to the same place with a different `stack depth` are invalid.  These can be caused by irreducible paths like jumping into loops and subroutines, and calling subroutines with different numbers of arguments.  Taken together, these rules allow for code to be validated  by following the control-flow graph, traversing each edge only once.

-Finally, Rule 4 precludes all stack underflows (and some stack overflows.)
+Finally, Rule 4 catches all stack underflows.  It also catches stack overflows in programs that overflow without (or before) recursing.

 ### Validation

-The following is a pseudo-Go specification of an algorithm for enforcing program validity.  It recursively traverses the bytecode, following its control flow and stack use and checking for violations of the rules above.  (For simplicity we ignore the issue of JUMPDEST or BEGINSUB bytes in PUSH data, assume an `advance_pc()` routine, and don't specify JUMPTABLE, which is just a loop over RJUMP.)  It runs in time == O(vertices + edges) in the program's control-flow graph, where vertices represent control-flow instructions and the edges represent basic blocks.
+The following is a pseudo-Go specification of an algorithm for enforcing program validity.  It recursively traverses the bytecode, following its control flow and stack use and checking for violations of the rules above.  (For simplicity we ignore the issue of JUMPDEST or BEGINSUB bytes in immediate data, assume an `advance_pc()` routine, and don't specify JUMPTABLE, which amounts to a loop over `RJUMP`.)  It runs in time == O(vertices + edges) in the program's control-flow graph, where vertices represent control-flow instructions and the edges represent basic blocks.
 ```
   var bytecode []byte
   var stack_depth []int
@ -305,7 +308,7 @@ The following is a pseudo-Go specification of an algorithm for enforcing program
         if instruction == RJUMP {

             // check for valid destination
-             jumpdest = *PC, PC++, jumpdest << 8
+             jumpdest = *PC, PC++, jumpdest << 8, jumpdest = *PC
             if bytecode[jumpdest] != JUMPDEST {
                 return false
             }
@ -317,7 +320,7 @@ The following is a pseudo-Go specification of an algorithm for enforcing program
         if instruction == RJUMPI {

             // check for valid destination
-             jumpdest = *PC, PC++, jumpdest << 8
+             jumpdest = *PC, PC++, jumpdest << 8, jumpdest = *PC
             if bytecode[jumpdest] != JUMPDEST {
                 return false
             }
@ -373,13 +376,12 @@ Bytecode: `0x60045e005c5d` (`PUSH1 0x04, JUMPSUB, STOP, BEGINSUB, RETURNSUB`)

 |  Pc   |      Op     | Cost |   Stack   |   RStack  |
 |-------|-------------|------|-----------|-----------|
-|    0  |      PUSH1  |    3 |        [] |        [] |
-|    2  |    JUMPSUB  |   10 |       [4] |        [] |
-|    5  |  RETURNSUB  |    5 |        [] |      [ 2] |
-|    3  |       STOP  |    0 |        [] |        [] |
+|    0  |    JUMPSUB  |    5 |        [] |        [] |
+|    3  |  RETURNSUB  |    5 |        [] |       [0] |
+|    4  |       STOP  |    0 |        [] |        [] |

 Output: 0x
-Consumed gas: `18`
+Consumed gas: `10`

 ### Two levels of subroutines

@ -389,27 +391,24 @@ Bytecode: `0x6800000000000000000c5e005c60115e5d5c5d` (`PUSH9 0x00000000000000000

 |  Pc   |      Op     | Cost |   Stack   |   RStack  |
 |-------|-------------|------|-----------|-----------|
-|    0  |      PUSH9  |    3 |        [] |        [] |
-|   10  |    JUMPSUB  |   10 |      [12] |        [] |
-|   13  |      PUSH1  |    3 |        [] |      [10] |
-|   15  |    JUMPSUB  |   10 |      [17] |      [10] |
-|   18  |  RETURNSUB  |    5 |        [] |   [10,15] |
-|   16  |  RETURNSUB  |    5 |        [] |      [10] |
-|   11  |       STOP  |    0 |        [] |        [] |
+|    0  |    JUMPSUB  |    5 |        [] |        [] |
+|    3  |    JUMPSUB  |    5 |        [] |       [0] |
+|    4  |  RETURNSUB  |    5 |        [] |     [0,3] |
+|    5  |  RETURNSUB  |    5 |        [] |       [3] |
+|    6  |       STOP  |    0 |        [] |        [] |

-Consumed gas: `36`
+Consumed gas: `20`

 ### Failure 1: invalid jump

 This should fail, since the given location is outside of the code-range. The code is the same as previous example, 
 except that the pushed location is `0x01000000000000000c` instead of `0x0c`.

-Bytecode: `0x6801000000000000000c5e005c60115e5d5c5d` (`PUSH9 0x01000000000000000c, JUMPSUB, STOP, BEGINSUB, PUSH1 0x11, JUMPSUB, RETURNSUB, BEGINSUB, RETURNSUB`)
+Bytecode: (`PUSH9 0x01000000000000000c, JUMPSUB, `0x6801000000000000000c5e005c60115e5d5c5d`, STOP, BEGINSUB, PUSH1 0x11, JUMPSUB, RETURNSUB, BEGINSUB, RETURNSUB`)

 |  Pc   |      Op     | Cost |   Stack   |   RStack  |
 |-------|-------------|------|-----------|-----------|
-|    0  |      PUSH9  |    3 |        [] |        [] |
-|   10  |    JUMPSUB  |   10 |[18446744073709551628] |        [] |
+|    0  |    JUMPSUB  |   10 |[18446744073709551628] |        [] |

 ```
 Error: at pc=10, op=JUMPSUB: invalid jump destination
@ -440,20 +439,52 @@ Bytecode: `0x6005565c5d5b60035e` (`PUSH1 0x05, JUMP, BEGINSUB, RETURNSUB, JUMPDE
 |    0  |      PUSH1  |    3 |        [] |        [] |
 |    2  |       JUMP  |    8 |       [5] |        [] |
 |    5  |   JUMPDEST  |    1 |        [] |        [] |
-|    6  |      PUSH1  |    3 |        [] |        [] |
-|    8  |    JUMPSUB  |   10 |       [3] |        [] |
-|    4  |  RETURNSUB  |    5 |        [] |      [ 8] |
-|    9  |       STOP  |    0 |        [] |        [] |
+|    6  |    JUMPSUB  |    5 |        [] |        [] |
+|    2  |  RETURNSUB  |    5 |        [] |       [2] |
+|    7  |       STOP  |    0 |        [] |        [] |

 Consumed gas: `30`

+## Appendix: Stack Frames
+
+Given [EIP-3337](https://eips.ethereum.org/EIPS/eip-3337), we can easily create an auxiliary stack in memory for local variables and temporaries.  Two conventions are typical -- either create the frame before calling the subroutine, or create the frame on entry to the subroutine.  Adding these two instructions would easse the way.  They are modelled on the corresponding Intel opcdodes.
+
+#### ENTER <frame_size>
+
+Write the current `FP` to memory as 4 bytes, MSB-first at `FP - frame_size`, then set `FP` to `FP - frame size`.
+
+This should be placed either before a `JUMPSUB` operation, or after a `BEGINSUB` -- depending on the calling convemtion.  Repeated calls to `PUSHFP` create a stack of frames, linked by their `previous FP` field. The stack grows towards lower addresses in memory, as is the common, efficient practice.
+
+#### `LEAVE`
+
+Restores `FP` to the value which was stored at offset `FP` in memory by `ENTER`.  
+
+This should be placed after the `JUMPSUB` operation or before the `RETURNSUB` -- depending on the calling convention -- to pop the most recent frame from the call stack.
+
+These operations would be especially useful if the gas formula for memory was reinterpreted so that writes from the top of memory down (equivalently, from -1 down, twos-complement) are charged the same as or writes from bottom of memory up.
+
+The total cost to expand the memory to size _a_ words is
+> `Cmem(a) = 3 * a + floor(a ** 2 / 512)`
+
+If the memory is already _b_ words long, the incremental cost is
+> `Cmem(a) - Cmem(b)`
+
+If _a_, _b_ and memory offsets in general are allowed to be negative the formula gives the desired results -- stack memory can grow from the top down, and heap memory can be allocated from the bottom up, without address conflicts or excessive gas charges.
+
+## Appendix: Code Sections
+
+Given [EIP-3540](https://eips.ethereum.org/EIPS/eip-3540) we can divide code into sections, with subroutines contained within sections.  These sections could provide syntactic boundaries against cross-routine control flow. Calls from within code sections to subroutines in other sections would be allowed.  but not Jumps into or out of code sections would not be allowed.
+
+
 ## References

-A.M. Turing, Proposals for the development in the Mathematics Division of an Automatic Computing Engine (ACE). Report E882, Executive Committee, NPL  1946
-Gavin Wood, [Ethereum:  A  Secure  Decentralized Generalized  Transaction  Ledger](https://ethereum.github.io/yellowpaper/paper.pdf), 2014-2021
-Greg Colvin, Brooklyn Zelenka, Paweł Bylica, Christian Reitwiessner, [EIP-615: Subroutines and Static Jumps for the EVM](https://eips.ethereum.org/EIPS/eip-615),  2016-2019
-Martin Lundfall, [EIP-2327: BEGINDATA Opcode](https://eips.ethereum.org/EIPS/eip-2327), 2019
+A.M. Turing, [Proposals for the development in the Mathematics Division of an Automatic Computing Engine (ACE)](http://www.alanturing.net/turing_archive/archive/p/p01/P01-001.html) Report E882, Executive Committee, NPL 1946
+Alex Beregszaszi, Paweł Bylica, Andrei Maiboroda, [EVM Object Format (EOF) v1](https://eips.ethereum.org/EIPS/eip-3540) 2021
+Andrei Maiboroda, [EOF static relative jumps](https://github.com/ethereum/evmone/pull/351)
+Gavin Wood, [Ethereum:  A Secure Decentralized Generalized  Transaction  Ledger](https://ethereum.github.io/yellowpaper/paper.pdf), 2014-2021
+Greg Colvin, Brooklyn Zelenka, Paweł Bylica, Christian Reitwiessner, [EIP-615: Subroutines and Static Jumps for the EVM](https://eips.ethereum.org/EIPS/eip-615), 2016-2019
 Nick Johnson, [EIP-3337: Frame pointer support for memory load and store operations](https://eips.ethereum.org/EIPS/eip-3337), 2021

 ## Copyright
 Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
+