From 5ddd33d37a8e6ebfd32bea0901b77096af60cb16 Mon Sep 17 00:00:00 2001 From: Balazs Komuves Date: Fri, 17 May 2024 18:39:13 +0200 Subject: [PATCH] add a high-level overview of all the steps and files occuring when doing proofs --- workflow/PROOFS.md | 164 +++++++++++++++++++++++++++++++++++++++++++++ workflow/README.md | 2 + 2 files changed, 166 insertions(+) create mode 100644 workflow/PROOFS.md diff --git a/workflow/PROOFS.md b/workflow/PROOFS.md new file mode 100644 index 0000000..4ac5b5f --- /dev/null +++ b/workflow/PROOFS.md @@ -0,0 +1,164 @@ + +The cogs of a ZK proof +====================== + +The goal of this document is to describe all the different moving parts +(circuit, witness, public input, etc) of a ZK proof through a simple example, +hopefully giving some intuition about how these cogs fit together. + + +The example: Fibonacci sequence +------------------------------- + +ZK proofs (or more precisely: "succint arguments") are all about convicing +another party (the "verifier"), about the truth of some statement, without +telling them _everything_. + +In our toy example, the statement will be the 1000th element of a Fibonacci +sequence. + +A Fibonacci sequence is generated deterministically from its first two elements, +let's call them `U` and `V`. These are numbers. The sequence is then defined as + + a[0] = U + a[1] = V + a[n] = a[n-1] + a[n-2] + +In the standard Fibonacci sequence, we have `U=0` and `V=1` (or sometimes `U=V=1`, +which results in the same sequence but shifted by one). But we can consider +this generalized setting with `U` and `V` being arbitrary numbers. + +The statement I want to convince you, is that for a given `U` (which I tell you, +say `U=53`), I know a value for `V` (which I don't tell you), such that `a[1000]` +is some given value (of course, I tell you this value too, otherwise there is +nothing to be convinced about). + +Pretty simple so far. + + +The statement +------------- + +A proof always start with a statement. Usually, the statement is descibed by +a computer program (in our case, the program is computing the Fibonacci sequence). +A program normally has some inputs and some ouputs. Some of the inputs are +public (I tell you what they are), some others can be secret (I don't tell you what +they are). And the statement is _usually_ that if you run this program with the given +inputs, then the program runs normally (it doesn't throw an error) and you get the +given output. + +Note: you can freely move things from outputs to inputs. In our case, the output +would be `a[1000]`, but equivalently, I can make another program where there is +a third input `W`, and the program just throws an error at the end if `a[1000] != W`. +Because of this, we often just say IO for input/output, and "public IO" for the +part of it I tell you, and "private IO" or "secret IO" for the part I don't tell +you. + +So far we have: + +- the statement (a computer program) +- the public IO +- and the private IO + +The computer program is often called a "circuit", because for technical reasons +it often needs to be described as an "arithmetic circuit", which is something +similar to digital electronics circuits (but in math). This gets a bit more +complicated with zkVM-s, where the circuit itself is something like a CPU, and your +statement is an actual program running on that (virtual) CPU. + + +The witness +----------- + +How essentially all proof system works is to translate the statement you want +the prove into some math equations, and then prove that I know a solution for +those equations. Because of this, the prover needs to know not only the input +and the output, but all the intermediate results too - essentially a very detailed +log of all individual operations happening in the computer program describing +the statement. This huge logfile is called "the witness", and in practice it's +usually just a large array of numbers (more precisely, finite field elements). + +Unfortunately, the word "witness" is a bit overloaded: Sometimes it refers only +to the private IO, and sometimes to the whole log. While these are kind of the same +from a holistic point of view, it still makes sense to distinguish them, and +I will call the whole execution log the witness. + +Creating a proof always start by running the program which descibes statement, +and creating this very detailed log, the witness (sometimes also called a +"transcript"). This step is usually called the "witness generation". + + +Trusted setup +------------- + +Some, but not all, proof systems requires something called a "trusted setup". +This is some structured piece of information produced by an external party, +which has some secret encoded in it. It's important that the secret is +deleted (hence the word "trusted") after generating this setup, because anybody +in possesion of the secret can cheat and create fake proofs. + +For example in the popular scheme called "KZG commitments", the secret is +a number (a field element) usually called `tau`, and the result of the +trusted setup is a sequence + + [ g , tau*g , tau^2*g , ... , tau^N*g ] + +where `g` is a fixed group element. Because of this, the procedure generating +this sequence is called a "powers-of-tau ceremony". Assuming the discrete +logarithm problem is hard in the group, it's not feasible to compute `tau` +itself given that sequence, but it's easy to compute `f(tau)*g` for a polynomial +`f` with degree at most `N`. + +In practice such a trusted setup can be accomplished using multiparty computation +(MPC), so that if there is at least _one_ honest participant, then the resulting +sequence is safe to use. These events are called "ceremonies" (in early days +people actually had to do this in person, and at the end they ceremonially destroyed +the computer which contained the random secret `tau`). + +There are two kinds of trusted setups: universal and circuit-specific. An universal +one can be used to prove many different statements, while a circuit-specific one +can be used to prove a single fixed statement, or circuit (but with different inputs). + +For example Groth16 needs a circuit-specific trusted setup, Plonk+KZG only needs +a universal trusted setup, while Plonk+FRI does not need any such setup at all. +However usually systems with trusted setups are more efficient (for example Groth16 has +the smallest proofs among all known proof system). In case of Groth16, the circuit-specific +setup is generated from an already existing universal setup and the circuit. + + +The files +--------- + +Using the popular [`circom`](https://docs.circom.io/) + [`snarkjs`](https://github.com/iden3/snarkjs) +ecosystem, all the above parts are in different files, which are produced by +different commands. + +These are: + +- `.circom` files contain the source code of your circuit (both the equations and + the actual program, interleaved together) +- `input.json` contains the circuit inputs (both public and private inputs) +- `.r1cs` file contains the circuit, that is, the statement you want to prove + (R1CS is short for Rank-1 Constraint System, a specific form of a "circuit"). + This is one output of the `circom` compiler, which reads the above source code and produces `.r1cs` files +- `.wtns` files contain the witness. This is generated by the so-called "witness generator", which is + another output of the `circom` compiler. Essentially the compiler separates the equations (goes into `.r1cs`) + and running the program (goes into the witness generator, which in this case can be either a WASM or a C++ program). + When you run the witness generator, it takes the `input.json` and produces the `.wtns` file +- `.ptau` files are containing universal trusted setups +- `.zkey` files are called a "prover key" (it contains everything the prover needs), and + in case of Groth16 it also corresponds to a circuit-specific trusted setup. +- the prover takes the `.zkey` and the `.wtns` files, and produces two outputs: a `proof.json` + containing the proof itself, and a `public.json` containing only the public input (so this + is copied from `input.json`, but the latter also contains the private inputs) +- from the `.zkey` file, a "verification key" can be extracted (this is again `.json` file), + which contains everything the verifier needs (in particular, it contains something like a + hash of the circuit. At least once this has to be checked against a source code, trusted setup + and resulting `.zkey`, so that you actually know which statement you verify! + However you don't want to do this agin and again, because the statement - the circuit - can be very big). + In practice the verifier key is often hardcoded in the on-chain verifier contract. +- then finally the verifier takes this verification key, the proof file, and the public input file + (all `.json` files here), and outputs either "valid" or "invalid" + +The command line workflow of doing all this is described in the file +[README.md](README.md). diff --git a/workflow/README.md b/workflow/README.md index 179e920..db56d06 100644 --- a/workflow/README.md +++ b/workflow/README.md @@ -14,6 +14,8 @@ To run the full workflow: NOTE: the examples below assume `bash`. In particular, it won't work with `zsh` (which is the dafault on newer macOS)! Because, you know, reasons... +To have an overview of what all the different steps and files are, see [PROOFS.md]. + ### Some measurements Approximate time to run this on an M2 (8+4 cores), with 10 samples: