r4bbit 9d5eea2b41 test(twap): cover RecordTick end-to-end and add zkVM cycle benchmark
Add the first end-to-end coverage of the oracle's RecordTick path, which
previously existed only as native unit tests:

- amm_twap_observations_accumulate_across_swaps_and_yield_time_weighted_average:
  drives swaps + RecordTick across simulated time, then checks the cumulative
  accumulator and the consulted time-weighted average.
- amm_twap_record_tick_sampling_guard_skips_calls_below_min_interval: exercises
  the min-interval sampling guard through the real instruction path.

Running RecordTick through the zkVM surfaced that committing the oracle-owned
~100 KiB observations account costs ~50.9M cycles — over the 2^25 (~33.5M)
public-execution limit — so the instruction aborted on chain. Reduce
OBSERVATIONS_CAPACITY 6396 -> 2048 (~16.8M cycles, ~half the limit); window
coverage is unchanged, only sampling resolution.

Add programs/benchmark, a standalone crate (excluded from the workspace so CI
and the Makefile skip it) that runs the guest ELF through the RISC Zero
executor and reports the per-instruction cycle split, reproducing the on-chain
pass/fail at the limit. Its cost-vs-capacity sweep still spans to 6396, guarding
against bumping capacity back into the over-budget range.
2026-06-23 16:30:18 +02:00

232 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# TWAP Oracle — `RecordTick` and the zkVM Cycle Budget
> **Status: fixed.** `OBSERVATIONS_CAPACITY` was reduced from 6396 to **2048**, bringing `RecordTick`
> to ~16.8M cycles (~half the limit). The rest of this document is the diagnosis that led there; the
> numbers labelled "cap 6396" are the pre-fix measurements that motivated the change.
At capacity 6396, `RecordTick` could not run on chain: a single call over a full-size observations
buffer cost **~50.9M zkVM cycles**, over the **~33.5M (2²⁵)** public-execution limit, and the runtime
aborted it. This document explains what a cycle is, why the limit exists, what the measurements
actually show (the cause is **not** what it first looks like), and the fix.
It is backed by a runnable benchmark — `programs/benchmark/tests/twap_cycle_bench.rs` — and
by two end-to-end tests in `tests/amm.rs`
(`amm_twap_observations_accumulate_*`, `amm_twap_record_tick_sampling_guard_*`).
---
## 1. What a "cycle" is
LEZ programs run inside the RISC Zero zkVM — a **proven RISC-V (rv32im) virtual machine**. A *cycle*
is one step of that virtual CPU: essentially one executed RISC-V instruction (most cost 1 cycle, a
few cost more, plus some fixed overhead). "50M cycles" ≈ "~50 million instruction-steps," **not** a
unit of wall-clock time.
The prover turns execution into an arithmetic trace with **one row per cycle**, and proving cost —
time and memory — scales roughly **linearly with cycle count**. So the cap is fundamentally an
**economic/latency bound on how large a computation the network will prove for one public call.**
The executor reports cycles in a few buckets (visible in the benchmark output):
- **user cycles** — the RISC-V instructions the guest logic actually runs.
- **paging cycles** — the zkVM's memory is a Merkle-committed image of ~1 KiB pages; touching a
page hashes it in/out. Large buffers page many pages. (In practice ~34% here.)
- **reserved cycles** — padding up to the proof system's power-of-two boundaries.
## 2. The limit
`nssa` sets the executor's `session_limit` to
`MAX_NUM_CYCLES_PUBLIC_EXECUTION = 1024 * 1024 * 32 = 33_554_432 = 2²⁵` cycles
(`nssa/src/program.rs`). Every program invocation — public or chained — runs under this cap via the
single `Program::execute` path; when a run reaches it, the executor aborts with
`Session limit exceeded: 33554432 >= 33554432`, which `nssa` surfaces as `ProgramExecutionFailed`.
## 3. What the measurements show
All numbers below are total cycles from the RISC Zero executor running the **real** `twap_oracle`
guest ELF, reproducing `Program::execute`'s exact input encoding (see the benchmark). They were
cross-checked against accounts extracted from a live `V03State`, which match to the cycle.
```
total cycles vs 2²⁵ limit
CreatePriceObservations (cap 6396) 17_301_504 ok
RecordTick, owned account (cap 6396) 50_855_936 OVER ← aborts on chain
RecordTick, UNOWNED account (cap 6396) 17_825_792 ok
```
The first surprise: **`RecordTick`'s Borsh work is not the problem.** With the observations account
left *uninitialized* (`program_owner = 0`) the same instruction over the same 102,388-byte buffer
costs only **17.8M** — comfortably under budget. The full deserialize + mutate + reserialize round
trip is the cheap part.
The actual driver is **account ownership**. The benchmark holds everything else fixed and flips only
the observations account's `program_owner`:
```
obs_owner = 0 → 17_825_792
obs_owner = oracle → 50_855_936 (+33.0M, 2.9×)
```
(The `current_tick` account's owner makes no difference — it's only 12 bytes.)
### Why ownership costs ~33M cycles
When a program touches an **initialized (owned)** account, the runtime cryptographically binds that
account's state into the proof — on both sides: the **pre-state it read** and the **post-state it
wrote**. That work is proportional to the account's serialized size. For a max-size ~100 KiB account
it is **~1617M cycles per side**.
That single fact explains the whole picture:
| instruction | owned ~100 KiB account is… | owned commits | total |
|---|---|---|---|
| `CreatePriceObservations` | **written** only (input is uninitialized) | 1× (write) | 17.3M ✅ |
| `RecordTick` | **read and written** | 2× (read + write) | 50.9M ❌ |
`CreatePriceObservations` pays the commit once and fits; `RecordTick` pays it twice and blows the
budget. Everything else (Borsh, paging, the tick arithmetic) is secondary.
### Cost scales linearly with capacity
`OBSERVATIONS_CAPACITY = 6396` exists to fill the 100 KiB account ceiling. Because both the commit
cost and the Borsh floor scale with account size, `RecordTick`'s cost scales ~linearly with
capacity:
```
RecordTick (owned) total cycles vs 2²⁵
cap 512 4_259_840 ok
cap 1024 8_388_608 ok
cap 2048 16_777_216 ok
cap 4096 32_538_624 ok (only ~3% headroom)
cap 6396 50_855_936 OVER
```
The largest power-of-two capacity that fits is **4096**, but only barely. **2048 (≈16.8M, ~2×
headroom)** or smaller is a safe target.
### Empty buffers cost exactly the same
Fill level is irrelevant. An effectively-empty buffer (just created, `write_index = 1`, one used
entry) and an all-non-zero buffer of the same capacity cost **identical** cycles:
```
empty buffer total = 50_855_936
filled buffer total = 50_855_936 (cap 6396, both 102,388 bytes)
```
The account is allocated at full size the moment `CreatePriceObservations` runs (it writes all
`OBSERVATIONS_CAPACITY` entries up front), so a brand-new, "empty" feed already pays the full cost.
The cost tracks **allocated size**, not how much meaningful data is stored. Reducing capacity is
therefore the only lever — you can't dodge it by keeping the buffer sparse.
### The overhead is per-byte, not a flat per-account tax
Measuring owned vs. uninitialized across account sizes, the owned-account overhead is **linear in
size at ~320 cycles per byte** (for the read + write the account gets in `RecordTick`; ~160/byte per
side):
```
cap bytes | owned unowned delta cyc/byte
32 564 | 524_288 262_144 262_144 465*
256 4_148 | 2_359_296 1_048_576 1_310_720 316
1024 16_436 | 8_388_608 3_145_728 5_242_880 319
2048 32_820 | 16_777_216 5_767_168 11_010_048 335
4096 65_588 | 32_538_624 11_534_336 21_004_288 320
6396 102_388 | 50_855_936 17_825_792 33_030_144 323
(* tiny accounts round up to the executor's 2^18-cycle segment quantum)
```
So you do **not** lose a fixed slice of the 2²⁵ budget for every owned account — you lose
~320 cycles per byte of owned account you **read-modify-write** (~160/byte if you only read, or only
write). For ordinary accounts this is noise: a ~50-byte token holding costs ~16 K cycles
(<0.05% of budget). It only becomes significant in the tens-of-KB range.
### This is a general size/cycle tension, not a TWAP quirk
The practical consequence: with ~320 cyc/byte for a read-modify-write plus the Borsh/IO floor, the
**largest owned account a single instruction can read-modify-write within budget is ~65 KB** and
at that size there is essentially no budget left for real work. Any program attempting to
read-modify-write a near-max-size (100 KiB) owned account hits the same wall TWAP did, needing
~50 M cycles against a ~33.5 M cap.
In other words, `DATA_MAX_LENGTH = 100 KiB` and `MAX_NUM_CYCLES_PUBLIC_EXECUTION = 2²⁵` are **not
jointly satisfiable for full-size read-modify-write**. The runtime permits 100 KiB accounts, but the
cycle budget can't commit one on both the read and write side. The implicit design rule is: large
accounts must be **read-only, write-only, or paged** per instruction never fully rewritten in
place. That's a normal ZK-rollup constraint, but it's currently unstated.
This is worth raising with the LEZ runtime team as a protocol-parameter question. Three levers, none
free:
- **Lower `DATA_MAX_LENGTH`** to a size that is committable *and* leaves room to compute (e.g. so a
full read-modify-write fits well under budget). Safest guarantee, but caps every program's account
size and read-only consumers of large accounts don't need it lowered.
- **Raise `MAX_NUM_CYCLES_PUBLIC_EXECUTION`** so a max-size account is affordable. Directly inflates
proving time/cost for *every* program, including ones that never touch big accounts.
- **Leave both, document the rule** that large accounts are not full-rewrite-able in one call, and
provide a paging pattern. Lowest blast radius; pushes complexity to programs that need big state.
The TWAP fix below (a smaller buffer) sidesteps the tension for this program regardless of which
lever the protocol ultimately picks.
## 4. The fix
The budget-breaking cost is **committing the owned ~100 KiB account**, which is intrinsic to the
account's *size*. The fix must shrink what gets committed.
### ✅ Reduce `OBSERVATIONS_CAPACITY` — the simple, effective fix
Cutting capacity reduces the committed account size, and cost falls ~linearly. Critically, **window
coverage is unaffected**: the sampling guard derives `min_interval = window_duration / capacity`, so
coverage = `capacity × min_interval = window_duration` regardless of capacity only *resolution*
(samples per window) drops. At capacity 2048 a 24 h window still samples every ~42 s; a 7 d window
every ~5 min. That is ample for a TWAP.
**Applied:** `OBSERVATIONS_CAPACITY = 2048` (≈16.8M cycles, ~2× headroom). 4096 fits but leaves no
margin for runtime variation; 2048 keeps `RecordTick` at roughly half the limit.
### ✅ Linked observation pages — if full resolution is required
Already sketched in `twap-oracle-observation-capacity.md`: keep a small fixed-size "head" account
plus older page accounts. `RecordTick` then commits only the small head, so per-call cost is bounded
regardless of total history. More moving parts (page PDAs, chain-walking readers); reserve it for
when a single reduced-capacity account genuinely can't hold enough resolution.
### ❌ Byte-patching `RecordTick` to skip the Borsh round-trip — does NOT fix this
An earlier hypothesis was that the cost was the full-buffer Borsh deserialize/reserialize, and that
patching the serialized bytes in place would make it O(1). **The measurements refute this.** The
Borsh round trip is only part of the 17.8M *unowned* floor; the ~33M that breaks the budget is the
owned-account commitment, which the runtime performs regardless of how the guest computes the new
bytes. Byte-patching would shave a little off the floor and leave `RecordTick` at ~33M+ still at
or over the edge. Avoid this as the primary fix; it addresses the wrong cost.
### ❌ Raising `MAX_NUM_CYCLES_PUBLIC_EXECUTION` — not a real fix
It's a platform-wide `nssa` constant; raising it inflates proving cost/time for every program and
only defers the wall, since the cost still scales with account size.
## 5. Reproducing
```sh
# Faithful cycle benchmark (synthetic inputs; reproduces the on-chain pass/fail at 2²⁵).
# `programs/benchmark` is a standalone crate, excluded from the workspace, so run it by manifest:
cargo test --manifest-path programs/benchmark/Cargo.toml -- --ignored --nocapture
# The end-to-end TWAP tests through the real zkVM path:
RISC0_DEV_MODE=1 cargo test -p integration_tests --test amm twap
```
The benchmark uses `risc0-zkvm` with the `prove` feature to run the guest with the session limit
lifted and read the `user/paging/reserved/total` cycle split. It only *executes* the guest it
never proves. It lives in the workspace-excluded `programs/benchmark` crate so normal
builds/tests/CI never compile it.
## 6. Acceptance — met
With `OBSERVATIONS_CAPACITY = 2048`:
- `twap_cycle_bench` reports `RecordTick (owned, cap 2048)` at ~16.8M, under the 2²⁵ limit (and its
sweep still shows cap 6396 aborting, guarding against bumping capacity back up).
- The two `tests/amm.rs` TWAP tests pass through the real zkVM path (no longer `#[ignore]`d).