diff --git a/docs/benchmarks/cycle_bench.md b/docs/benchmarks/cycle_bench.md index 62db9b1d..fca9f12c 100644 --- a/docs/benchmarks/cycle_bench.md +++ b/docs/benchmarks/cycle_bench.md @@ -1,8 +1,6 @@ # cycle_bench -Per-program Risc0 cycle counts, prover wall time, PPE composition cost, -and verifier wall time for the built-in LEZ programs. Inputs for the -fee model's `G_executor`, `G_prove`, `G_verify`, and `S_agg` parameters. +Per-program Risc0 cycle counts, prover wall time, PPE composition cost, and verifier wall time for the built-in LEZ programs. Inputs for the fee model's `G_executor`, `G_prove`, `G_verify`, and `S_agg` parameters. ## Machine @@ -18,8 +16,7 @@ fee model's `G_executor`, `G_prove`, `G_verify`, and `S_agg` parameters. ## Executor cycles -`SessionInfo::cycles()` per instruction. Deterministic across runs. Wall time -is `best / mean ± stdev` over 5 timed iterations (1 warmup discarded). +`SessionInfo::cycles()` per instruction. Deterministic across runs. Wall time is `best / mean ± stdev` over 5 timed iterations (1 warmup discarded). | Program | Instruction | user_cycles | segments | exec_ms (best / mean ± stdev) | |---|---|---:|---:|---| @@ -35,8 +32,7 @@ is `best / mean ± stdev` over 5 timed iterations (1 warmup discarded). ## Real proving (`--prove`) -`prover.prove(env, elf)` wall time per program on CPU. `total_cycles` is -`user_cycles` rounded up to the next power of two (Risc0 padding). +`prover.prove(env, elf)` wall time per program on CPU. `total_cycles` is `user_cycles` rounded up to the next power of two (Risc0 padding). | Program | Instruction | total_cycles | prove_ms | prove_s | |---|---|---:|---:|---:| @@ -50,15 +46,11 @@ is `best / mean ± stdev` over 5 timed iterations (1 warmup discarded). | amm | AddLiquidity | 1,048,576 | 111,654 | 111.7 | | amm | SwapExactInput | 1,048,576 | 126,400 | 126.4 | -Linear fit across po2 buckets: ≈ 100 µs per total cycle (≈ 10k cycles/s -throughput on this CPU). +Linear fit across po2 buckets: ≈ 100 µs per total cycle (≈ 10k cycles/s throughput on this CPU). ## PPE composition + chain-call sweep (`--ppe`) -Same `auth_transfer Transfer` instruction, standalone vs wrapped in the -privacy circuit; plus the `chain_caller` test program with N chained -`authenticated_transfer` calls. `proof_bytes` is the borsh-serialized -InnerReceipt (S_agg in the fee model). +Same `auth_transfer Transfer` instruction, standalone vs wrapped in the privacy circuit; plus the `chain_caller` test program with N chained `authenticated_transfer` calls. `proof_bytes` is the borsh-serialized. InnerReceipt (S_agg in the fee model). | Case | prove_ms | prove_s | proof_bytes | |---|---:|---:|---:| @@ -69,15 +61,11 @@ InnerReceipt (S_agg in the fee model). | chain_caller depth=5 | 372,123 | 372.1 | 223,551 | | chain_caller depth=9 | 544,280 | 544.3 | 223,551 | -Linear fit depth=1..9: ≈ 53 s per additional chained call, intercept ≈ 73 s. -Composition tax (single program PPE − standalone): ≈ 48 s. `proof_bytes` is -constant: the outer succinct proof has fixed size; the journal carried -alongside it scales with public state and is reported separately by `--verify`. +Linear fit depth=1..9: ≈ 53 s per additional chained call, intercept ≈ 73 s. Composition tax (single program PPE − standalone): ≈ 48 s. `proof_bytes` is constant: the outer succinct proof has fixed size; the journal carried alongside it scales with public state and is reported separately by `--verify`. ## Verifier (`--verify`) -One PPE receipt generated once (auth_transfer Transfer in PPE), then -`Receipt::verify(PRIVACY_PRESERVING_CIRCUIT_ID)` measured over 1000 iterations. +One PPE receipt generated once (auth_transfer Transfer in PPE), then `Receipt::verify(PRIVACY_PRESERVING_CIRCUIT_ID)` measured over 1000 iterations. | Field | Value | |---|---| @@ -88,14 +76,10 @@ One PPE receipt generated once (auth_transfer Transfer in PPE), then ## Findings -- Proving cost scales with po2-bucketed `total_cycles`, not raw `user_cycles`. - Trimming user_cycles only helps if it crosses a 2^N boundary. +- Proving cost scales with po2-bucketed `total_cycles`, not raw `user_cycles`. Trimming user_cycles only helps if it crosses a 2^N boundary. - Single-program PPE composition tax on M2 Pro CPU: ≈ 48 s (61.5 − 13.7). -- Chained-call cost is linear at ≈ 53 s per call. A max-depth chain (10) would - take ≈ 600 s standalone on this CPU. -- `G_verify` is ≈ 12 ms and roughly constant per outer receipt (1000-iter - stdev ≈ 2 ms). The succinct outer proof is fixed at 223,551 bytes (S_agg); - verify is not on the latency critical path. +- Chained-call cost is linear at ≈ 53 s per call. A max-depth chain (10) would take ≈ 600 s standalone on this CPU. +- `G_verify` is ≈ 12 ms and roughly constant per outer receipt (1000-iter stdev ≈ 2 ms). The succinct outer proof is fixed at 223,551 bytes (S_agg); verify is not on the latency critical path. ## Reproduce @@ -110,8 +94,5 @@ JSON output: `target/cycle_bench.json`. ## Caveats -- CPU-only proving on a dev laptop. Production prover hardware (GPU, - specialised CPU pipelines) will produce much smaller numbers; relative - ordering should be preserved. -- Single-segment cases only; multi-segment programs would pay continuation - overhead not measured here. +- CPU-only proving on a dev laptop. Production prover hardware (GPU, specialised CPU pipelines) will produce much smaller numbers; relative ordering should be preserved. +- Single-segment cases only; multi-segment programs would pay continuation overhead not measured here.