mirror of
https://github.com/logos-blockchain/lssa.git
synced 2026-05-22 09:39:44 +00:00
99 lines
4.5 KiB
Markdown
99 lines
4.5 KiB
Markdown
# cycle_bench
|
||
|
||
Per-program Risc0 cycle counts, prover wall time, PPE composition cost, and verifier wall time for the built-in LEZ programs. Inputs for the fee model's `G_executor`, `G_prove`, `G_verify`, and `S_agg` parameters.
|
||
|
||
## Machine
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Chip | Apple M2 Pro (8P+4E) |
|
||
| RAM | 16 GB |
|
||
| OS | macOS 15.5 |
|
||
| Rust | 1.94.0 |
|
||
| Risc0 zkVM | 3.0.5 |
|
||
| Profile | release |
|
||
| GPU acceleration | none |
|
||
|
||
## Executor cycles
|
||
|
||
`SessionInfo::cycles()` per instruction. Deterministic across runs. Wall time is `best / mean ± stdev` over 5 timed iterations (1 warmup discarded).
|
||
|
||
| Program | Instruction | user_cycles | segments | exec_ms (best / mean ± stdev) |
|
||
|---|---|---:|---:|---|
|
||
| authenticated_transfer | Initialize | 43,642 | 1 | 18.86 / 19.41 ± 0.48 |
|
||
| authenticated_transfer | Transfer | 77,095 | 1 | 19.67 / 20.84 ± 1.16 |
|
||
| token | Burn | 116,546 | 1 | 24.86 / 25.46 ± 0.63 |
|
||
| token | Mint | 116,862 | 1 | 24.47 / 25.08 ± 0.42 |
|
||
| token | Transfer | 127,726 | 1 | 25.00 / 25.40 ± 0.29 |
|
||
| clock | Tick (no rollups) | 137,022 | 1 | 21.18 / 21.57 ± 0.41 |
|
||
| ata | Create | 175,056 | 1 | 23.64 / 24.94 ± 1.09 |
|
||
| amm | SwapExactInput | 508,634 | 1 | 34.21 / 34.77 ± 0.55 |
|
||
| amm | AddLiquidity | 642,774 | 1 | 37.59 / 37.87 ± 0.28 |
|
||
|
||
## Real proving (`--prove`)
|
||
|
||
`prover.prove(env, elf)` wall time per program on CPU. `total_cycles` is `user_cycles` rounded up to the next power of two (Risc0 padding).
|
||
|
||
| Program | Instruction | total_cycles | prove_ms | prove_s |
|
||
|---|---|---:|---:|---:|
|
||
| authenticated_transfer | Initialize | 131,072 | 11,881 | 11.9 |
|
||
| authenticated_transfer | Transfer | 131,072 | 13,705 | 13.7 |
|
||
| token | Burn | 262,144 | 22,893 | 22.9 |
|
||
| token | Mint | 262,144 | 23,927 | 23.9 |
|
||
| token | Transfer | 262,144 | 27,178 | 27.2 |
|
||
| clock | Tick | 262,144 | 23,486 | 23.5 |
|
||
| ata | Create | 262,144 | 21,093 | 21.1 |
|
||
| amm | AddLiquidity | 1,048,576 | 111,654 | 111.7 |
|
||
| amm | SwapExactInput | 1,048,576 | 126,400 | 126.4 |
|
||
|
||
Linear fit across po2 buckets: ≈ 100 µs per total cycle (≈ 10k cycles/s throughput on this CPU).
|
||
|
||
## PPE composition + chain-call sweep (`--ppe`)
|
||
|
||
Same `auth_transfer Transfer` instruction, standalone vs wrapped in the privacy circuit; plus the `chain_caller` test program with N chained `authenticated_transfer` calls. `proof_bytes` is the borsh-serialized. InnerReceipt (S_agg in the fee model).
|
||
|
||
| Case | prove_ms | prove_s | proof_bytes |
|
||
|---|---:|---:|---:|
|
||
| auth_transfer Transfer standalone | 13,705 | 13.7 | n/a |
|
||
| auth_transfer Transfer in PPE | 61,486 | 61.5 | 223,551 |
|
||
| chain_caller depth=1 | 122,590 | 122.6 | 223,551 |
|
||
| chain_caller depth=3 | 231,974 | 232.0 | 223,551 |
|
||
| chain_caller depth=5 | 372,123 | 372.1 | 223,551 |
|
||
| chain_caller depth=9 | 544,280 | 544.3 | 223,551 |
|
||
|
||
Linear fit depth=1..9: ≈ 53 s per additional chained call, intercept ≈ 73 s. Composition tax (single program PPE − standalone): ≈ 48 s. `proof_bytes` is constant: the outer succinct proof has fixed size; the journal carried alongside it scales with public state and is reported separately by `--verify`.
|
||
|
||
## Verifier (`--verify`)
|
||
|
||
One PPE receipt generated once (auth_transfer Transfer in PPE), then `Receipt::verify(PRIVACY_PRESERVING_CIRCUIT_ID)` measured over 1000 iterations.
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| case | auth_transfer Transfer in PPE |
|
||
| proof_bytes (S_agg) | 223,551 |
|
||
| journal_bytes | 412 |
|
||
| verify_ms (best / mean ± stdev, n=1000) | 11.71 / 12.06 ± 1.99 |
|
||
|
||
## Findings
|
||
|
||
- Proving cost scales with po2-bucketed `total_cycles`, not raw `user_cycles`. Trimming user_cycles only helps if it crosses a 2^N boundary.
|
||
- Single-program PPE composition tax on M2 Pro CPU: ≈ 48 s (61.5 − 13.7).
|
||
- Chained-call cost is linear at ≈ 53 s per call. A max-depth chain (10) would take ≈ 600 s standalone on this CPU.
|
||
- `G_verify` is ≈ 12 ms and roughly constant per outer receipt (1000-iter stdev ≈ 2 ms). The succinct outer proof is fixed at 223,551 bytes (S_agg); verify is not on the latency critical path.
|
||
|
||
## Reproduce
|
||
|
||
```sh
|
||
cargo run --release -p cycle_bench
|
||
cargo run --release -p cycle_bench --features prove -- --prove
|
||
cargo run --release -p cycle_bench --features ppe -- --prove --ppe
|
||
cargo run --release -p cycle_bench --features ppe -- --verify --verify-iters 1000
|
||
```
|
||
|
||
JSON output: `target/cycle_bench.json`.
|
||
|
||
## Caveats
|
||
|
||
- CPU-only proving on a dev laptop. Production prover hardware (GPU, specialised CPU pipelines) will produce much smaller numbers; relative ordering should be preserved.
|
||
- Single-segment cases only; multi-segment programs would pay continuation overhead not measured here.
|