Add specs for the CPU table (#1375)

* Add specs for the CPU table

* Add general columns

* Apply comments

* Apply comments
This commit is contained in:
Hamy Ratoanina 2023-11-22 18:27:07 -05:00 committed by GitHub
parent bec1073cf9
commit 2d5a84a138
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 75 additions and 1 deletions

View File

@ -1,4 +1,78 @@
\subsection{CPU}
\label{cpu}
TODO
The CPU is the central component of the zkEVM. Like any CPU, it reads instructions, executes them and modifies the state (registers and the memory)
accordingly. The constraining of some complex instructions (e.g. Keccak hashing) is delegated to other tables.
This section will only briefly present the CPU and its columns. Details about the CPU logic will be provided later.
\subsubsection{CPU flow}
An execution run can be decomposed into three distinct parts:
\begin{itemize}
\item \textbf{Bootstrapping:} The CPU starts by writing all the kernel code to memory and then hashes it. The hash is then compared to a public value shared with
the verifier to ensure that the kernel code is correct.
\item \textbf{CPU cycles:} The bulk of the execution. In each row, the CPU reads the current code at the program counter (PC) address, and executes it. The current code can be the kernel code,
or whichever code is being executed in the current context (transaction code or contract code). Executing an instruction consists in modifying the registers, possibly
performing some memory operations, and updating the PC.
\item \textbf{Padding:} At the end of the execution, we need to pad the length of the CPU trace to the next power of two. When the program counter reaches the special halting label
in the kernel, execution halts. Constraints ensure that every subsequent row is a padding row and that execution cannot resume.
\end{itemize}
In the CPU cycles phase, the CPU can switch between different contexts, which correspond to the different environments of the possible calls. Context 0 is the kernel itself, which
handles initialization (input processing, transaction parsing, transaction trie updating...) and termination (receipt creation, final trie checks...) before and after executing the transaction. Subsequent contexts are created when
executing user code (transaction or contract code). In a non-zero user context, syscalls may be executed, which are specific instructions written in the kernel. They don't change the context
but change the code context, which is where the instructions are read from.
\subsubsection{CPU columns}
\paragraph*{Registers:} \begin{itemize}
\item \texttt{is\_bootstrap\_kernel}: Boolean indicating whether this is a bootstrapping row or not. It must be 1 at the first row, then switch to 0 until the end.
\item \texttt{context}: Indicates which context we are in. 0 for the kernel, and a positive integer for every user context. Incremented by 1 at every call.
\item \texttt{code\_context}: Indicates in which context the code to execute resides. It's equal to \texttt{context} in user mode, but is always 0 in kernel mode.
\item \texttt{program\_counter}: The address of the instruction to be read and executed.
\item \texttt{stack\_len}: The current length of the stack.
\item \texttt{stack\_len\_bounds\_aux}: Helper column used to check that the stack doesn't overflow in user mode.
\item \texttt{is\_kernel\_mode}: Boolean indicating whether we are in kernel (i.e. privileged) mode. This means we are executing kernel code, and we have access to
privileged instructions.
\item \texttt{gas}: The current amount of gas used in the current context. It is eventually checked to be below the current gas limit. Must fit in 32 bits.
\item \texttt{is\_keccak\_sponge}: Boolean indicating whether we are executing a Keccak hash. This happens whenever a \texttt{KECCAK\_GENERAL} instruction is executed, or at the last
cycle of bootstrapping to hash the kernel code.
\item \texttt{clock}: Monotonic counter which starts at 0 and is incremented by 1 at each row. Used to enforce correct ordering of memory accesses.
\item \texttt{opcode\_bits}: 8 boolean columns, which are the bit decomposition of the opcode being read at the current PC.
\end{itemize}
\paragraph*{Operation flags:} Boolean flags. During CPU cycles phase, each row executes a single instruction, which sets one and only one operation flag. No flag is set during
bootstrapping and padding. The decoding constraints ensure that the flag set corresponds to the opcode being read.
There isn't a 1-to-1 correspondance between instructions and flags. For efficiency, the same flag can be set by different, unrelated instructions (e.g. \texttt{eq\_iszero}, which represents
the \texttt{EQ} and the \texttt{ISZERO} instructions). When there is a need to differentiate them in constraints, we filter them with their respective opcode: since the first bit of \texttt{EQ}'s opcode
(resp. \texttt{ISZERO}'s opcode) is 0 (resp. 1), we can filter a constraint for an EQ instruction with \texttt{eq\_iszero * (1 - opcode\_bits[0])}
(resp. \texttt{eq\_iszero * opcode\_bits[0]}).
\paragraph*{Memory columns:} The CPU interacts with the EVM memory via its memory channels. At each row, a memory channel can execute a write, a read, or be disabled. A full memory channel is composed of:
\begin{itemize}
\item \texttt{used}: Boolean flag. If it's set to 1, a memory operation is executed in this channel at this row. If it's set to 0, no operation is done but its columns might be reused for other purposes.
\item \texttt{is\_read}: Boolean flag indicating if a memory operation is a read or a write.
\item 3 \texttt{address} columns. A memory address is made of three parts: \texttt{context}, \texttt{segment} and \texttt{virtual}.
\item 8 \texttt{value} columns. EVM words are 256 bits long, and they are broken down in 8 32-bit limbs.
\end{itemize}
The last memory channel is a partial channel: it doesn't have its own \texttt{value} columns and shares them with the first full memory channel. This allows us to save eight columns.
\paragraph*{General columns:} There are 8 shared general columns. Depending on the instruction, they are used differently:
\begin{itemize}
\item \texttt{Exceptions}: When raising an exception, the first three general columns are the bit decomposition of the exception code.
They are used to jump to the correct exception handler.
\item \texttt{Logic}: For EQ, and ISZERO operations, it's easy to check that the result is 1 if \texttt{input0} and \texttt{input1} are equal. It's more difficult
to prove that, if the result is 0, the inputs are actually unequal. To prove it, each general column contains the modular inverse of $(\texttt{input0}_i - \texttt{input1}_i)$
for each limb $i$ (or 0 if the limbs are equal). Then the quantity $\texttt{general}_i * (\texttt{input0}_i - \texttt{input1}_i)$ will be 1 if and only if $\texttt{general}_i$ is
indeed the modular inverse, which is only possible if the difference is non-zero.
\item \texttt{Jumps}: For jumps, we use the first two columns: \texttt{should\_jump} and \texttt{cond\_sum\_pinv}. \texttt{should\_jump} conditions whether the EVM should jump: it's
1 for a JUMP, and $\texttt{condition} \neq 0$ for a JUMPI. To check if the condition is actually non-zero for a JUMPI, \texttt{cond\_sum\_pinv} stores the modular inverse of
\texttt{condition} (or 0 if it's zero).
\item \texttt{Shift}: For shifts, the logic differs depending on whether the displacement is lower than $2^{32}$, i.e. if it fits in a single value limb.
To check if this is not the case, we must check that at least one of the seven high limbs is not zero. The general column \texttt{high\_limb\_sum\_inv} holds the modular inverse
of the sum of the seven high limbs, and is used to check it's non-zero like the previous cases.
Contrary to the logic operations, we do not need to check limbs individually: each limb has been range-checked to 32 bits, meaning that it's not possible for the sum to
overflow and be zero if some of the limbs are non-zero.
\item \texttt{Stack}: The last three columns are used by popping-only (resp. pushing-only) instructions to check if the stack is empty after (resp. was empty
before) the instruction. We use the last columns to prevent conflicts with the other general columns. More details are provided in the stack handling section.
\end{itemize}

Binary file not shown.