Update eip-2315.md (#4206)

* Update eip-2315.md * Update eip-2315.md
2025-02-23 12:18:16 +00:00 · 2021-09-24 01:52:33 -04:00 · 2021-09-24 01:52:33 -04:00 · b1b3aff7ec
commit b1b3aff7ec
parent a3928d2584
1 changed files with 321 additions and 169 deletions
--- a/EIPS/eip-2315.md
+++ b/EIPS/eip-2315.md
@ -4,22 +4,22 @@ title: Simple Subroutines for the EVM
 status: Draft
 type: Standards Track
 category: Core
-author: Greg Colvin <greg@colvin.org>, Martin Holst Swende (@holiman)
+author: Greg Colvin <greg@colvin.org>, Martin Holst Swende (@holiman), Brooklyn Zelenka (@expede)
 discussions-to: https://ethereum-magicians.org/t/eip-2315-simple-subroutines-for-the-evm/3941
 created: 2019-10-17
 ---

 ## Abstract

-This proposal introduces five opcodes to better support simple subroutines and relative jumps: `BEGINSUB`, `JUMPSUB` `RETURNSUB`, `JUMPR` and `JUMPRI`.
+This proposal introduces four opcodes to support simple subroutines and relative jumps: `JUMPSUB` `RETURNSUB`, `RJUMP` and `RJUMPI`.

-This change supports substantial reductions in the gas costs of calling and optimizing simple subroutines – from %33 to as much as 54%.
+This change supports substantial reductions in the complexity and the gas costs of calling and optimizing simple subroutines – from %33 to as much as 52% savings in gas..

 ## Motivation

-The EVM does not provide subroutines as a primitive.  Instead, calls can be synthesized by fetching and pushing the current program counter on the data stack and jumping to the subroutine address; returns can be synthesized by getting the return address to the top of the stack and jumping back to it.  These conventions are more costly than necessary.
+The EVM does not provide subroutines as a primitive.  Instead, calls can be synthesized by fetching and pushing the current program counter on the data stack and jumping to the subroutine address; returns can be synthesized by getting the return address to the top of the stack and jumping back to it.  These conventions are more costly and complex than necessary.  This cost and complexity are borne by the humans and programs writing, reading, and analyzing EVM code,

-Facilities to directly support subroutines are provided by all but one of the real and virtual machines programmed by the lead author, including the Burroughs 5000, CDC 7600, IBM 360, DEC PDP 11 and VAX, Motorola 68000, a few generations of Intel silicon, Sun SPARC, UCSD p-Machine, Sun JVM, Wasm, and the sole exception  the EVM.  In whatever form, these operations provide for
+Facilities to directly support subroutines are provided by all but one of the real and virtual machines programmed by the lead author, including the Burroughs 5000, CDC 7600, IBM 360, DEC PDP 11 and VAX, Motorola 68000, a few generations of Intel silicon, Sun SPARC, UCSD p-Machine, Sun JVM, Wasm, and the sole exception -- the EVM.  In whatever form, these operations provide for
 * capturing the current context of execution,
 * transferring control to a new context, and 
 * returning to the original context
@ -31,7 +31,9 @@ The concept goes back to [Turing, 1946](http://www.alanturing.net/turing_archive
 > ...
 > When we wish to start on a subsidiary operation we need only make a note of where we left off the major operation and then apply the first instruction of the subsidiary.  When the subsidiary is over we look up the note and continue with the major operation. Each subsidiary operation can end with instructions for this recovery of the note.  How is the burying and disinterring of the note to be done?  There are of course many ways.  One is to keep a list of these notes in one or more standard size delay lines, (1024) with the most recent last.  The position of the most recent of these will be kept in a fixed TS, and this reference will be modified every time a subsidiary is started or finished...

-We propose to follow Turing's simple concept in our subroutine design, as specified below.  Note that this specification is entirely semantic.  It constrains only stack usage and control flow and imposes no syntax on code beyond being a sequence of bytes to be executed.
+We propose to follow Turing's simple concept in our subroutine design, as specified below.
+
+_Note that this specification is entirely semantic.  It constrains only data usage and control flow and imposes no syntax on code beyond being a sequence of bytes to be executed._

 ## Specification

@ -39,16 +41,12 @@ We introduce one more stack into the EVM in addition to the existing `data stack

 ### Instructions

-####  `BEGINSUB( 0x5c)`
-
-> Marks an entry point to a subroutine.  Execution of a `BEGINSUB` is a no-op.  The cost is _jumpdest_.
-
 #### `JUMPSUB (0x5d) location`

 > Transfers control to a subroutine.
 >
 > 1. Decode the `location` from the immediate data.  The data is encoded as three bytes, MSB-first.
-> 2. If the opcode at `location` is not a `BEGINSUB` _`abort`_.
+> 2. If the opcode at `location` is not a `JUMPDEST` _`abort`_.
 > 3. If the `return stack` already has `1024` items _`abort`_.
 > 4. Push the current `PC + 1` to the `return stack`.
 > 5. Set `PC` to `location`.
@ -69,9 +67,9 @@ We introduce one more stack into the EVM in addition to the existing `data stack
 >
 > * _pops one item off the `return stack`_

-To take full advantage of the performance benefits of simple subroutines we also provide two new static, relative jump functions that take their arguments as immediate data rather then off the stack.
+To provide a complete set of control structures, and to take full advantage of the performance benefits of simple subroutines we also provide two  static, relative jump functions that take their arguments as immediate data rather then off the stack.

-#### `JUMPR (0x??) offset`
+#### `RJUMP (0x??) offset`

 > Transfers control to the address `PC + offset`, where offset is a three-byte, MSB first, twos-complement integer.
 >
@ -81,7 +79,7 @@ To take full advantage of the performance benefits of simple subroutines we also
 >
 >  The cost is _low_.

-#### `JUMPRI (0x??) offset`
+#### `RJUMPI (0x??) offset`

 > Conditionally transfers control to the address `PC + offset`, where offset is a three-byte, MSB first, twos-complement integer.
 > 1. Decode the `offset` from the immediate data.  The data is encoded as three bytes, MSB first, twos-complement.
@ -95,158 +93,51 @@ To take full advantage of the performance benefits of simple subroutines we also
 _Notes:_
 * _If a resulting `PC` to be executed is beyond the last instruction then the opcode is implicitly a `STOP`, which is not an error._
 * _Values popped off the `return stack` do not need to be validated, since they are alterable only by `JUMPSUB` and `RETURNSUB`._
-* _The description above lays out the semantics of this feature in terms of a `return stack`.  But the actual state of the `return stack` is not observable by EVM code or consensus-critical to the protocol.  (For example, a node implementor may code `JUMPSUB` to unobservably push `PC` on the `return stack` rather than `PC + 1`, which is allowed so long as `RETURNSUB` observably returns control to the `PC + 1` location.)_
+* _The description above lays out the semantics of this feature in terms of a `return stack`.  But the actual state of the `return stack` is not observable by EVM code or consensus-critical to the protocol.  (For example, a node implementer may code `JUMPSUB` to unobservably push `PC` on the `return stack` rather than `PC + 1`, which is allowed so long as `RETURNSUB` observably returns control to the `PC + 1` location.)_
 * _The `return stack` is the functional equivalent of Turing's "delay line"._

+The _low_ cost of `JUMPSUB` is justified by needing only about six Go operations to push the return address on the return stack and decode the immediate two byte destination to the `PC`.   The _verylow_ cost of `RETURNSUB` is justified by needing only about three Go operations to pop the return stack into the `PC`.  No 256-bit arithmetic or checking for valid destinations is needed.  Also, `JUMP` is assigned _mid_, and `JUMPSUB` and RJUMP should be more efficient, as decoding immediate bytes should be cheaper than than converting 32-byte stack items, and the destination address will not need to be checked for either `JUMPSUB` or `RETURNSUB`.  Benchmarking will be needed to tell if the costs are well-balanced. 
+
+### Validity
+
+This EIP specifies validity rules for some important safety properties, including
+
+    valid instructions,
+    valid jump destinations,
+    no stack underflows, and
+    no stack overflows without recursion.
+
+Valid contracts will not halt with an exception unless they either run out of gas or overflow stack during a recursive subroutine call.
+
+Because of the dynamic JUMP and JUMPI instructions contracts these rules are necessary but not sufficient conditions for validity.  So we cannot prove contracts to be valid, but only ensure that attempts to create contracts that can be proven to be invalid will fail.
+
+#### Exceptional Halting States
+
+_Execution_ is as defined in the [Yellow Paper](https://ethereum.github.io/yellowpaper/paper.pdf)  a sequence of changes to the EVM state.  The conditions on valid _code_ are preserved by state changes.  At runtime, if execution of an instruction would violate a condition the execution is in an exceptional halting state.  The Yellow Paper defines five such states.
+1. Insufficient gas
+2. More than 1024 stack items
+3. Insufficient stack items
+4. Invalid jump destination
+5. Invalid instruction
+
+We would like to consider EVM _code_ valid iff no execution of the program can lead to an exceptional halting state, but we must be able to validate _code_ in linear time to avoid denial of service attacks.  So in practice, we can only partially meet these requirements.  Our validation rules do not consider the _code's_ data and computations, only its control flow and stack use.  This means we will reject programs with any invalid _code_ paths, even if those paths are not reachable at runtime.
+
+### Validation Rules
+
+> This section extends the contact creation validation rules (as defined in EIP-3540 and EIP-3670.)
+1. Every `RJUMP` and `RJUMPI` addresses a valid `JUMPDEST`.
+2. The stack depth is
+   * always positive and
+   * the same on every path through an opcode.
+3. The `stack pointer` is always positive and at most 1024.
+
+We need to define `stack depth`.  The Yellow Paper has the `stack pointer` (`SP`) pointing just past the top item on the `data stack`.   We define the `stack base` (`BP`)as the element that the `SP` addressed at the entry to the current _basic block_, or `0` on program entry.  So we can define the `stack depth` as the number of stack elements between the current `SP` and the current `BP`.
+
+Taken together, these rules allow for code to be validated by traversing the control-flow graph, following each edge only once.  
+
 ### Dependencies

-We need [EIP-3540: EVM Object Format (EOF)](./eip-3540.md) to allow for immediate arguments without special encoding.
-
-## Rationale
-
-We modeled this design on Moore's 1970 [Forth virtual machine](http://www.ultratechnology.com/4th_1970.pdf). It is a two-stack design – the data stack is supplemented with a return stack to support jumping into and returning from subroutines, as specified above.  The return address (Turing's "note") is pushed onto the return stack (Turing's "delay line") when calling, and the stack is popped into the `PC` when returning.
-
-The alternative design is to push the return address and the destination address on the data stack before jumping, and to pop the data stack and jump back to the popped `PC` to return.  We prefer the separate return stack because it ensures that the return address cannot be overwritten or mislaid, uses fewer data stack slots, and obviates any need to swap the return address past the arguments or return values on the stack.  Crucially, a dynamic jump is not needed to implement subroutine returns`.
-
-The _low_ cost of `JUMPSUB` is justified by needing only about six Go operations to push the return address on the return stack, and decode the immediate two byte destination to the `PC`.   The _verylow_ cost of `RETURNSUB` is justified by needing only about three Go operations to pop the return stack into the `PC`.  No 256-bit arithmetic or checking for valid destinations is needed.  Also, `JUMP` is assigned _mid_, and `JUMPSUB` and JUMPR should be more efficient, as decoding immediate bytes should be cheaper than than converting 32-byte stack items, and the destination address will not need to be checked for either `JUMPSUB` or `RETURNSUB`.  Benchmarking will be needed to tell if the costs are well-balanced. 
-
-### Gas Cost Analysis
-
-These opcodes reduce the gas costs of both ordinary subroutine calls and low-level optimizations.  The savings reported here will of course be less relevant to programs that use a few large subroutines rather than being a factored than in to smaller ones.   The choice of gas costs for the new opcodes above does not make a large difference in this analysis, as much of the improvement is due to PUSH and SWAP operations that are no longer needed.  Even if `JUMPSUB` cost the same as `JUMP` – 8 gas rather than 5 - a simple subroutine call would still be 52% less costly versus 54%.
-
-**_Note**: the **JUMP** versions of the examples below are all **valid code**._
-
-#### **Simple Subroutine Call**
-
-Consider this example of calling a minimal subroutine
-using  `JUMPSUB`
-```
-ADD:
-    beginsub          ; 1 gas
-    0x02              ; 3 gas
-    0x03              ; 3 gas
-    jumpsub ADDITION  ; 5 gas
-    returnsub         ; 3 gas
-
-ADDITION:
-    beginsub          ; 1 gas
-    add               ; 3 gas
-    returnsub         ; 3 gas
-
-Total 22 gas.
-```
-The same code, using `JUMP`.
-```
-TEST_ADD:
-   jumpdest           ; 1 gas
-   RTN_ADD            ; 3 gas
-   0x02               ; 3 gas
-   0x03               ; 3 gas
-   ADDITION    ; 3 gas
-   jump               ; 8 gas
-RTN_ADD:
-   jumpdest           ; 1 gas
-   swap1              ; 3 gas
-   jump               ; 8 gas
-
-ADDITION:
-   jumpdest           ; 1 gas
-   add                ; 3 gas
-   swap1              ; 3 gas
-   jump               ; 8 gas
-
-Total: 48 gas
-```
-Using `JUMPSUB` saves **_48 - 22 = 26_** gas versus using `JUMP` – a 54% performance improvement.
-
-The advantages of JUMPR can be seen in, e.g., the tail simple subroutine call.
-
-#### **Tail Call Optimization**
-
-Of course in cases like this one we can optimize the tail call, so that the final `RETURNSUB` in `ADDITION` actually returns from TEST_ADD.
-```
-TEST_ADD:
-    beginsub          ; 1 gas
-    0x02              ; 3 gas
-    0x03              ; 3 gas
-    jumpsub ADDITION ;  3 gas
-
-ADDITION:
-    beginsub          ; 1 gas
-    add               ; 3 gas
-    returnsub         ; 3 gas
-
-Total: 20 gas
-```
-Or the same code, using `JUMP`
-```
-TEST_ADD:
-   jumpdest           ; 1 gas
-   0x02               ; 3 gas
-   0x03               ; 3 gas
-   ADDITION           ; 3 gas
-   jump               ; 8 gas
-
-ADDITION:
-   jumpdest           ; 1 gas
-   add                ; 3 gas
-   swap1              ; 3 gas
-   jump               ; 8 gas
-
-Total: 33 gas
-```
-Using `JUMPSUB` saves **_33 - 20 = 13_** gas versus using `JUMP` – a 39% performance improvement.
-
-####  **Tail Call Elimination**
-
-We can even take advantage of `ADDITION` just happening to directly follow `TEST_ADD` and just fall through rather than jump at all.
-```
-TEST_ADD:
-    beginsub          ; 1 gas
-    0x02              ; 3 gas
-    0x03              ; 3 gas
-ADDITION:
-    beginsub          ; 1 gas
-    add               ; 3 gas
-    returnsub         ; 3 gas
-
-Total 16 gas.
-```
-The same code, using JUMP.
-```
-TEST_ADD:
-   jumpdest           ; 1 gas
-   0x02               ; 3 gas
-   0x03               ; 3 gas
-ADDITION:
-   jumpdest           ; 1 gas
-   add                ; 3 gas
-   swap1              ; 3 gas
-   jump               ; 8 gas
-
-Total: 24 gas
-```
-Using `JUMPSUB` saves **_22 - 14 = 8_** gas versus using `JUMP` – a 36% performance improvement.
-
-Finally, we can take a look at using `JUMPR` instead of `JUMP`
-
-#### **Tail Calls with JUMPR**
-```
-TEST_ADD:
-   jumpdest           ; 1 gas
-   0x02               ; 3 gas
-   jumpr ADDITION
-                          ; 3 gas
-
-ADDITION:
-   jumpdest           ; 1 gas
-   add                ; 3 gas
-   swap1              ; 3 gas
-   jump               ; 8 gas
-
-Total: 22 gas.
-```
-Using `JUMPR` saves **_33 - 22_ = 11_** gas  – a 33% performance improvement.
+We need [EIP-3540: EVM Object Format (EOF)](./eip-3540.md) to support immediate arguments and [EIP-3670: EOF - Code Validation](./eip-3670.md) to support validation of instructions.

 ## Backwards and Forwards Compatibility

@ -254,11 +145,264 @@ These changes do not affect the semantics of existing EVM code.

 These changes are compatible with using [EIP-3337](https://eips.ethereum.org/EIPS/eip-3337) to provide stack frames, by associating a frame with each subroutine.

-## Security Considerations
+## Rationale

-These changes do introduce new flow control instructions, so any software which does static/dynamic analysis of EVM code needs to be modified accordingly. The `JUMPSUB` semantics are similar to `JUMP` (but jumping to a `BEGINSUB`), whereas the `RETURNSUB` instruction is different, since it can 'land' on any opcode (but the possible destinations can be statically inferred).
+We modeled this design on Moore's 1970 [Forth virtual machine](http://www.ultratechnology.com/4th_1970.pdf). It is a simple two-stack design – the data stack is supplemented with a return stack to support jumping to and returning from subroutines, as specified above, and as conceptualized by Turing.  The return address (Turing's "note") is pushed onto the return stack (Turing's "delay line") when calling, and the return address is popped into the `PC` when returning.

-If [`EIP-`3779](./eip-3779.md) – Safe Control Flow for the EVM – advances then the requirement on `JUMPSUB` to `abort` if the opcode at `location` is not a `BEGINSUB` will need to be enforced at creation time rather than runtime.
+The alternative design is to push the return address and the destination address on the data stack before jumping to the subroutine, and to later jump back to the return address on the stack in order to return.  This is the current approach.  It could be streamlined to some extent by having JUMPSUB push the return address for RETURNSUB to pop.
+
+We prefer the separate return stack because it maintains a clear separation between data and flow of control.  This ensures that the return address cannot be overwritten or mislaid.  It also reduces costs by using fewer data stack slots and moving less data.
+
+### Gas Cost Analysis
+
+We show here how these opcodes can be used to reduce the gas costs of both ordinary subroutine calls and low-level optimizations.  The savings reported here will of course be less relevant to programs that use a few large subroutines rather than being a factored than into smaller ones.   The choice of gas costs for the new opcodes above does not make a large difference in this analysis, as much of the improvement is due to PUSH and SWAP operations that are no longer needed.  Even if `JUMPSUB` cost the same as `JUMP` – 8 gas rather than 5 - a simple subroutine call would still be 48% less costly versus 52%.
+
+**_Note**: the **JUMP** versions of the examples below are all **valid code**._
+
+#### **Simple Subroutine Call**
+
+Consider this example of calling a fairly minimal subroutine
+using `JUMPSUB`
+
+Subroutine call, using `JUMPSUB`
+```
+TEST_SQUARE:
+    jumpdest        ; 1 gas
+    0x02            ; 3 gas
+    jumpsub SQUARE  ; 5 gas
+    returnsub       ; 3 gas
+
+SQUARE:
+    jumpdest        ; 1 gas
+    dup1            ; 3 gas
+    mul             ; 5 gas
+    returnsub       ; 3 gas
+
+Total 24 gas.
+```
+Subroutine call, using `JUMP`
+```
+TEST_SQUARE:
+    jumpdest        ; 1 gas
+    RTN_SQUARE      ; 3 gas
+    0x02            ; 3 gas
+    SQUARE          ; 3 gas
+    jump            ; 8 gas
+RTN_SQUARE:
+    jumpdest        ; 1 gas
+    swap1           ; 3 gas
+    jump            ; 8 gas
+
+SQUARE:
+    jumpdest        ; 1 gas
+    dup1            ; 3 gas
+    mul             ; 5 gas
+    swap1           ; 3 gas
+    jump            ; 8 gas
+
+Total: 53 gas
+```
+Using `JUMPSUB` saves **_50 - 24 = 26_** gas versus using `JUMP` – a 52% performance improvement.
+
+#### **Tail Call Optimization**
+
+Of course in cases like this one we can optimize the tail call, so that the return from `SQUARE` actually returns from TEST_SQUARE.
+
+Tail call optimization, using `RJUMP` and `RETURNSUB`.
+```
+TEST_SQUARE:
+    jumpdest        ; 1 gas
+    0x02            ; 3 gas
+    rjump SQUARE    ; 3 gas
+
+SQUARE:
+    jumpdest        ; 1 gas
+    dup1            ; 3 gas
+    mul             ; 5 gas
+    returnsub       ; 3 gas
+
+Total: 19 gas
+```
+Tail call optimization, using `JUMP`
+```
+TEST_SQUARE:
+    jumpdest        ; 1 gas
+    0x02            ; 3 gas
+    SQUARE          ; 3 gas
+    jump            ; 8 gas
+
+SQUARE:
+    jumpdest        ; 1 gas
+    dup1            ; 3 gas
+    mul             ; 5 gas
+    swap1           ; 3 gas
+    jump            ; 8 gas
+
+Total: 35 gas
+```
+Using `JUMPSUB` versus `JUMP` saves **_35 - 19 = 16_** gas – a 46% performance improvement.
+
+So we can see that these instructions provide a simpler and more efficient subroutine mechanism than dynamic jumps.
+
+####  **Tail Call Elimination**
+
+We can even take advantage of `SQUARE` just happening to directly follow `TEST_SQUARE` and just fall through rather than jump at all.
+
+Tail call `elimination`, using JUMPSUB.
+```
+TEST_SQUARE:
+    jumpdest        ; 1 gas
+    0x02            ; 3 gas
+SQUARE:
+    jumpdest        ; 1 gas
+    dup1            ; 3 gas
+    mul             ; 5 gas
+    returnsub       ; 3 gas
+
+Total 16 gas.
+```
+Tail call elimination, using JUMP.
+```
+TEST_SQUARE:
+    jumpdest        ; 1 gas
+    0x02            ; 3 gas
+SQUARE:
+    jumpdest        ; 1 gas
+    dup1            ; 3 gas
+    mul             ; 5 gas
+    swap1           ; 3 gas
+    jump            ; 8 gas
+
+Total: 24 gas
+```
+Using `RETURNSUB` versus `JUMP` saves **_24 - 16 = 8_** gas – a 33% performance improvement.
+
+We can also consider the alternative subroutine call, using a version of `JUMPSUB` that pushes its return address on the stack.
+```
+TEST_SQUARE:
+    jumpdest        ; 1 gas
+    0x02            ; 3 gas
+    jumpsub SQUARE  ; 5 gas
+    swap1           ; 3 gas
+    returnsub       ; 3 gas
+
+SQUARE:
+    jumpdest        ; 1 gas
+    dup1            ; 3 gas
+    mul             ; 5 gas
+    swap1           ; 3 gas
+    returnsub       ; 3 gas
+```
+Total 31 gas, compared to 24 gas for the return stack version.
+
+##Validation Algorithm 
+
+> This section specifies an algorithm for checking the above the rules.  Equivalent code must be run at creation time.  We assume that the validation defined in EIP-3540 and EIP-3670 has already run, although in practice the algorithms can be merged.
+
+The following is a pseudo-Go implementation of an algorithm for enforcing adherence to the above rules.  This algorithm is a symbolic execution of the program that recursively traverses the bytecode, following its control flow and stack use and checking for violations of the rules above.   It uses a stack to track the slots that hold `PUSHed` constants, from which it pops the destinations to validate during the analysis.
+
+This algorithm runs in time equal to `O(vertices + edges)` in the program's control-flow graph, where edges represent control flow and the vertices represent _basic blocks_ – thus the algorithm takes time proportional to the size of the bytecode.
+
+For simplicity's sake we assume a few helper functions.
+* `advance_pc()` advances the `PC`,  skipping any immediate data.
+* `imm_data()` returns immediate data for an instruction.`J
+* `valid_jumpdest()` checks that a jump destination is not in immediate data.
+* `remove_items()` returns the number of items removed from the `stack` by an instruction
+* `add_items() returns the number of items added to the `stack`.  Items are added as 0xFFFFFFFF.
+   The `PC`, `PUSH…`, `SWAP…`,  `DUP…`, `JUMP`, and `JUMPI` instructions are handled separately.
+```
+var code  [code_len]byte
+var depth [code_len]unsigned
+var stack [1024]int256 = { -1 } // stack grows down
+var sp := 1023            
+var bp := 1023
+    
+func validate(pc := 0, depth := 0) boolean {
+
+   for ; pc < code_len; pc = advance_pc(pc) {
+      
+      // successful termination
+      switch instruction {
+      case STOP    return true
+      case RETURN  return true
+      case SUICIDE return true
+      }
+
+      // check for stack underflow and overflow
+      depth := bp - sp
+      if depth < 0 || sp < 0 || 1024 < sp {
+         return false
+      }
+
+      // if stack depth for `pc` is non-zero we have been here before 
+      // so return to break cycle in control flow graph
+      if depth[pc] != 0 {
+          return true
+      }
+      depth[pc] = depth
+
+      if (PUSH1 <= instruction && instruction <= PUSH16) {
+         stack[sp++] = imm_data(pc)
+         continue
+      }
+      if (DUP1 <= instruction && instruction <= DUP16) {
+         n := instruction - DUP1 + 1
+         stack[sp + 1] = stack[n + 1]
+         continue
+      }
+      if (SWAP1 <= instruction && instruction <= SWAP16) {
+         n := instruction - SWAP1 + 1
+         swap := stack[n]
+         stack[n] = stack[sp + 1]
+         stack[sp + 1] = swap
+         continue
+      }
+
+      if (instruction == RJUMP) {
+
+         // check for valid destination
+         jumpdest = pc + imm_data(pc)
+         if !valid_jumpdest(jumpdest) {
+            return false
+         }
+
+         // will enter basic block at destination
+         bp = sp
+
+         // reset pc to destination of jump 
+         pc = jumpdest
+         continue
+      }
+      if (instruction == RJUMPI {
+
+         // check for valid destination
+         jumpdest = pc + imm_data(pc)
+         if !valid_jumpdest(dest) {
+            return false
+         }
+
+         // will enter basic block at destination or next instruction
+         bp = sp
+
+         // recurse to jump to code to validate
+         if !validate(dest) {
+            return false
+         }
+       
+        // false side of conditional -- continue to next instruction
+         pc++
+         continue
+      }
+
+      // apply other instructions to stack
+      sp += remove_items(pc)
+      sp -= add_items(pc)
+   }
+   
+   // successful termination
+   return true
+}
+```

 ## Test Cases

@ -266,7 +410,7 @@ If [`EIP-`3779](./eip-3779.md) – Safe Control Flow for the EVM – advances th

 This should jump into a subroutine, back out and stop.

-Bytecode: `0x60045e005c5d` (`PUSH1 0x04, JUMPSUB, STOP, BEGINSUB, RETURNSUB`)
+Bytecode: `0x60045e005b5d` (`PUSH1 0x04, JUMPSUB, STOP, JUMPDEST, RETURNSUB`)

 |  Pc   |      Op     | Cost |   Stack   |   RStack  |
 |-------|-------------|------|-----------|-----------|
@ -281,7 +425,7 @@ Consumed gas: `10`

 This should execute fine, going into one two depths of subroutines

-Bytecode: `0x6800000000000000000c5e005c60115e5d5c5d` (`PUSH9 0x00000000000000000c, JUMPSUB, STOP, BEGINSUB, PUSH1 0x11, JUMPSUB, RETURNSUB, BEGINSUB, RETURNSUB`)
+Bytecode: `0x6800000000000000000c5e005b60115e5d5b5d` (`PUSH9 0x00000000000000000c, JUMPSUB, STOP, JUMPDEST, PUSH1 0x11, JUMPSUB, RETURNSUB, JUMPDEST, RETURNSUB`)

 |  Pc   |      Op     | Cost |   Stack   |   RStack  |
 |-------|-------------|------|-----------|-----------|
@ -298,7 +442,7 @@ Consumed gas: `20`
 This should fail, since the given location is outside of the code-range. The code is the same as previous example, 
 except that the pushed location is `0x01000000000000000c` instead of `0x0c`.

-Bytecode: (`PUSH9 0x01000000000000000c, JUMPSUB, `0x6801000000000000000c5e005c60115e5d5c5d`, STOP, BEGINSUB, PUSH1 0x11, JUMPSUB, RETURNSUB, BEGINSUB, RETURNSUB`)
+Bytecode: (`PUSH9 0x01000000000000000c, JUMPSUB, `0x6801000000000000000c5e005b60115e5d5b5d`, STOP, JUMPDEST, PUSH1 0x11, JUMPSUB, RETURNSUB, JUMPDEST, RETURNSUB`)

 |  Pc   |      Op     | Cost |   Stack   |   RStack  |
 |-------|-------------|------|-----------|-----------|
@ -326,7 +470,7 @@ Error: at pc=0, op=RETURNSUB: invalid retsub

 In this example. the JUMPSUB is on the last byte of code. When the subroutine returns, it should hit the 'virtual stop' _after_ the bytecode, and not exit with error

-Bytecode: `0x6005565c5d5b60035e` (`PUSH1 0x05, JUMP, BEGINSUB, RETURNSUB, JUMPDEST, PUSH1 0x03, JUMPSUB`)
+Bytecode: `0x6005565b5d5b60035e` (`PUSH1 0x05, JUMP, JUMPDEST, RETURNSUB, JUMPDEST, PUSH1 0x03, JUMPSUB`)

 |  Pc   |      Op     | Cost |   Stack   |   RStack  |
 |-------|-------------|------|-----------|-----------|
@ -338,3 +482,11 @@ Bytecode: `0x6005565c5d5b60035e` (`PUSH1 0x05, JUMP, BEGINSUB, RETURNSUB, JUMPDE
 |    7  |       STOP  |    0 |        [] |        [] |

 Consumed gas: `30`
+
+
+## Security Considerations
+
+These changes do introduce new flow control instructions, so any software which does static/dynamic analysis of EVM code needs to be modified accordingly. The `JUMPSUB` semantics are similar to `JUMP` whereas the `RETURNSUB` instruction is different, since it can 'land' on any opcode (but the possible destinations can be statically inferred).
+
+## Copyright
+Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).