This proposal introduces three opcodes to support subroutines: `BEGINSUB`, `JUMPSUB` and `RETURNSUB`. (The smallest possible change would do without `BEGINSUB`).
Safety properties equivalent to [EIP-615](https://eips.ethereum.org/EIPS/eip-615) can be ensured by enforcing a few simple rules, which can be validated with the provided algorithm, and without imposing syntactic constraints.
The EVM does not provide subroutines as a primitive. Instead, calls can be synthesized by fetching and pushing the current program counter on the data stack and jumping to the subroutine address; returns can be synthesized by getting the return address to the top of the stack and jumping back to it. In the EVM the return
Facilities to directly support subroutines are provided by all but one of the machines programmed by the lead author, including the B5000, CDC7600, IBM360, PDP8, PDP11, VAX, M68000, 80x86, SPARC, p-Machine, JVM and EVM. In whatever form, these operations provide for capturing the current context of execution, transferring control to a new context, and returning to the original context. The concept goes back to Turing:
> We also wish to be able to arrange for the splitting up of operations into subsidiary operations. This should be done in such a way that once we have written down how an operation is done we can use it as a subsidiary to any other operation.
> ...
> When we wish to start on a subsidiary operation we need only make a note of where we left off the major operation and then apply the first instruction of the subsidiary. When the subsidiary is over we look up the note and continue with the major operation. Each subsidiary operation can end with instructions for this recovery of the note. How is the burying and disinterring of the note to be done? There are of course many ways. One is to keep a list of these notes in one or more standard size delay lines, with the most recent last. The position of the most recent of these will be kept in a fixed TS, and this reference will be modified every time a subsidiary is started or finished. The burying and disinterring processes are fairly elaborate, but there is fortunately no need to repeat the instructions involved, each time, the burying being done through a standard instruction table BURY, and the disinterring by the table UNBURY.
>
> Notes: TS 1 contains the address of the currently executing instruction. "minor cycle" = word.
We propose to use Turing's simple _return-stack_ mechanism, long known to work well for virtual stack machines, which we specify here. Note that this specification is entirely semantic. It constrains only stack usage and control flow and imposes no syntax on code beyond being a sequence of bytes to be executed.
We introduce one more stack into the EVM in addition to the existing `data stack` which we call the `return stack`. The `return stack` is limited to `1024` items. This stack supports the three new instructions for subroutines.
_Note 3: The description above lays out the semantics of this feature in terms of a `return stack`. But the actual state of the `return stack` is not observable by EVM code or consensus-critical to the protocol. (For example, a node implementor may code `JUMPSUB` to unobservably push `pc` on the `return stack` rather than `pc + 1`, which is allowed so long as `RETURNSUB` observably returns control to the `pc + 1` location.)_
We modeled this design on the simple, proven, archetypal Forth virtual machine of 1970. It is a two-stack design -- the data stack is supplemented with Turing's return stack to support jumping into and returning from subroutines, as specified above.
The separate return stack ensures that the return address cannot be overwritten or mislaid, and obviates any need to swap the return address past the arguments on the stack. Importantly, a dynamic jump is not needed to implement subroutine returns, allowing for deprecation of dynamic uses of `JUMP` and `JUMPI`.
These changes are compatible with using [EIP-3337](https://eips.ethereum.org/EIPS/eip-3337) to provide stack frames, by associating a frame with each subroutine.
Consider this example of calling a minimal subroutine.
```
TEST_DIV:
beginsub ; 1 gas
0x02 ; 3 gas
0x03 ; 3 gas
jumpsub DIVIDE ; 5 gas
returnsub ; 5 gas
DIVIDE:
beginsub ; 1 gas
mul ; 5 gas
returnsub ; 5 gase
```
Total 28 gas.
The same code, using JUMP.
```
TEST_DIV:
jumpdest ; 1 gas
RTN_MUL ; 3 gas
0x02 ; 3 gas
0x03 ; 3 gas
DIVIDE ; 3 gas
jump ; 8 gas
RTN_DIV:
jumpdest ; 1 gas
swap1 ; 3 gas
jump ; 8 gas
DIVIDE:
jumpdest ; 1 gas
div ; 5 gas
swap1 ; 3 gas
jump ; 8 gas
```
50 gas total.
Both approaches need to push two arguments and divide = 11 gas, so control flow gas is 39 using `JUMP` versus 17 using `JUMPSUB`.
That’s a savings of 22 gas.
In the general case of one routine calling another I don’t think the `JUMP` version can do better. Of course in this case we can optimize the tail call, so that the final jump in `DIVIDE` actually returns from TEST_DIV.
```
TEST_DIV:
jumpdest ; 1 gas
0x02 ; 3 gas
0x03 ; 3 gas
DIVIDE ; 3 gas
jump ; 8 gas
DIVIDE:
jumpdest ; 1 gas
div ; 5 gas
swap1 ; 3 gas
jump ; 8 gas
```
Total 35 gas, which is still worse than with `JUMPSUB`.
We could even take advantage of `DIVIDE` just happening to directly follow `TEST_DIV` and just fall through:
```
TEST_DIV:
jumpdest ; 1 gas
0x02 ; 3 gas
0x03 ; 3 gas
DIVIDE:
jumpdest ; 1 gas
div ; 5 gas
swap1 ; 3 gas
jump ; 8 gas
```
Total 24 gas, better than `JUMPSUB`.
However, `JUMPSUB` can do even better with the same optimizations.
These changes do introduce new flow control instructions, so any software which does static/dynamic analysis of evm-code needs to be modified accordingly. The `JUMPSUB` semantics are similar to `JUMP` (but jumping to a `BEGINSUB`), whereas the `RETURNSUB` instruction is different, since it can 'land' on any opcode (but the possible destinations can be statically inferred).
Safety and amenability to static analysis of valid programs can be made comparable to [EIP-615](https://eips.ethereum.org/EIPS/eip-615), but without imposing syntactic constraints, and thus with minimal impact on low-level optimizations (as shown above.) Validity can ensured by following the rules given in the next section, and programs can be validated with the provided algorithm. The validation algorithm is simple and bounded by the size of the code, allowing for validation at creation time. And compilers can easily follow the rules.
_Execution_ is as defined in the [Yellow Paper](https://ethereum.github.io/yellowpaper/paper.pdf) — a sequence of changes in the EVM state. The conditions on valid code are preserved by state changes. At runtime, if execution of an instruction would violate a condition the execution is in an exceptional halting state. The Yellow Paper defines five such states.
We would like to consider EVM code valid iff no execution of the program can lead to an exceptional halting state, but we must be able to validate code in linear time to avoid denial of service attacks. So in practice, we can only partially meet these requirements. Our validation algorithm does not consider the code’s data and computations, only its control flow and stack use. This means we will reject programs with any invalid code paths, even if those paths are not reachable at runtime. Further, conditions 1 and 2 — Insufficient gas and stack overflow — must in general be checked at runtime. Conditions 3, 4, and 5 cannot occur if the code conforms to the following rules.
Rule 0, depracating `JUMP` and `JUMPI`, would forbid dynamic jumps. Absent dynamic jumps another mechanism is needed for subroutine returns, as provided here.
Jump destinations are currently checked at runtime. Static jumps allow them to be validated at creation time, per rule 1. _Note: Valid instructions are not part of PUSH data._
For rules 3 and 4 we need to define `stack depth`. The Yellow Paper has the `stack pointer` or `SP` pointing just past the top item on the `data stack`. We define the `stack base` as where the `SP` pointed before the most recent `JUMPSUB`, or `0` on program entry. So we can define the `stack depth` as the number of stack elements between the current `SP` and the current `stack base`.
Given our definition of `stack depth` Rule 3 ensures that control flows which return to the same place with a different `stack depth` are invalid. These can be caused by irreducible paths like jumping into loops and subroutines, and calling subroutines with different numbers of arguments. Taken together, these rules allow for code to be validated by following the control-flow graph, traversing each edge only once.
The following is a pseudo-Go specification of an algorithm for enforcing program validity. It recursively traverses the bytecode, following its control flow and stack use and checking for violations of the rules above. (For simplicity we ignore the issue of JUMPDEST or BEGINSUB bytes in PUSH data, assume an `advance_pc()` routine, and don't specify JUMPTABLE, which is just a loop over RJUMP.) It runs in time == O(vertices + edges) in the program's control-flow graph, where vertices represent control-flow instructions and the edges represent basic blocks.
In this example. the JUMPSUB is on the last byte of code. When the subroutine returns, it should hit the 'virtual stop' _after_ the bytecode, and not exit with error
A.M. Turing, Proposals for the development in the Mathematics Division of an Automatic Computing Engine (ACE). Report E882, Executive Committee, NPL 1946
Gavin Wood, [Ethereum: A Secure Decentralized Generalized Transaction Ledger](https://ethereum.github.io/yellowpaper/paper.pdf), 2014-2021
Greg Colvin, Brooklyn Zelenka, Paweł Bylica, Christian Reitwiessner, [EIP-615: Subroutines and Static Jumps for the EVM](https://eips.ethereum.org/EIPS/eip-615), 2016-2019