mirror of
https://github.com/logos-storage/plonky2.git
synced 2026-01-03 14:23:07 +00:00
Add MPT specs
This commit is contained in:
parent
d4b05f3730
commit
3af316f37b
@ -18,3 +18,13 @@
|
||||
year = {2019},
|
||||
note = {\url{https://ia.cr/2019/953}},
|
||||
}
|
||||
|
||||
@article{yellowpaper,
|
||||
title={Ethereum: A secure decentralised generalised transaction ledger},
|
||||
author={Wood, Gavin and others},
|
||||
journal={Ethereum project yellow paper},
|
||||
volume={151},
|
||||
number={2014},
|
||||
pages={1--32},
|
||||
year={2014}
|
||||
}
|
||||
|
||||
@ -1,9 +1,16 @@
|
||||
\section{Merkle Patricia tries}
|
||||
\section{Merkle Patricia Tries}
|
||||
\label{tries}
|
||||
The \emph{EVM World state} is a representation of the different accounts at a particular time, as well as the last processed transactions together with their receipts. The world state is represented using \emph{Merkle Patricia Tries} (MPTs) \cite[App.~D]{yellowpaper}, and there are three different tries: the state trie, the transaction trie and the receipt trie.
|
||||
|
||||
For each transaction we need to show that the prover knows preimages of the hashed initial and final EVM states. At the onset of its execution, the kernel stores these three tries within the {\tt Segment::TrieData} segment. The prover loads the initial tries from the inputs into the memory. Subsequently, the tries are modified during transaction execution, inserting new nodes or the deleting of existing nodes.
|
||||
|
||||
A MPT is composed of five different nodes: branch, extension, leaf, empty and digest nodes. Branch and leaf nodes might contain a payload whose format depends on the particular trie. The nodes are encoded, primarily using RLP encoding and Hex-prefix encoding (see \cite{yellowpaper} App. B and C, respectively). The resulting encoding is then hashed, following a strategy similar to that of normal Merkle trees, to generate the trie hashes.
|
||||
|
||||
Insertion and deletion is performed in the same way as other MPTs implementations, however we don't modify the MPT in memory but create a new one with the modifications. In the rest of this section we describe how the MPTs are represented in memory, how they are given as input, and how MPTs are hashed.
|
||||
|
||||
\subsection{Internal memory format}
|
||||
|
||||
Withour our zkEVM's kernel memory,
|
||||
The tries are stored in the kernel memory, specifically in the {\tt Segment:TrieData} segment. Each node type is stored as
|
||||
\begin{enumerate}
|
||||
\item An empty node is encoded as $(\texttt{MPT\_NODE\_EMPTY})$.
|
||||
\item A branch node is encoded as $(\texttt{MPT\_NODE\_BRANCH}, c_1, \dots, c_{16}, v)$, where each $c_i$ is a pointer to a child node, and $v$ is a pointer to a value. If a branch node has no associated value, then $v = 0$, i.e. the null pointer.
|
||||
@ -12,15 +19,76 @@ Withour our zkEVM's kernel memory,
|
||||
\item A digest node is encoded as $(\texttt{MPT\_NODE\_HASH}, d)$, where $d$ is a Keccak256 digest.
|
||||
\end{enumerate}
|
||||
|
||||
On the other hand the values or payloads are represented differently depending on the particular trie.
|
||||
|
||||
\subsubsection{State trie}
|
||||
The state trie payload contains the account data. Each account is stored in 4 contiguous memory addresses containing
|
||||
\begin{enumerate}
|
||||
\item the nonce,
|
||||
\item the balance,
|
||||
\item a pointer to the account's storage trie,
|
||||
\item a hash of the account's code.
|
||||
\end{enumerate}
|
||||
The storage trie payload in turn is a single word.
|
||||
|
||||
\subsubsection{Transaction Trie}
|
||||
The transaction trie nodes contain the length of the RLP encoded transaction, followed by the bytes of the RLP encoding of the transaction.
|
||||
|
||||
\subsubsection{Receipt Trie}
|
||||
The payload of the recipts trie is a receipt. Each receipt is stored as
|
||||
\begin{enumerate}
|
||||
\item the length in words of the payload,
|
||||
\item the status,
|
||||
\item the cumulative gas used,
|
||||
\item the bloom filter, stored as 256 words.
|
||||
\item the number of topics,
|
||||
\item the topics
|
||||
\item the data length,
|
||||
\item the data.
|
||||
\end{enumerate}
|
||||
|
||||
|
||||
\subsection{Prover input format}
|
||||
|
||||
The initial state of each trie is given by the prover as a nondeterministic input tape. This tape has a slightly different format:
|
||||
\begin{enumerate}
|
||||
\item An empty node is encoded as $(\texttt{MPT\_NODE\_EMPTY})$.
|
||||
\item A branch node is encoded as $(\texttt{MPT\_NODE\_BRANCH}, v_?, c_1, \dots, c_{16})$. Here $v_?$ consists of a flag indicating whether a value is present,\todo{In the current implementation, we use a length prefix rather than a is-present prefix, but we plan to change that.} followed by the actual value payload if one is present. Each $c_i$ is the encoding of a child node.
|
||||
\item A branch node is encoded as $(\texttt{MPT\_NODE\_BRANCH}, v_?, c_1, \dots, c_{16})$. Here $v_?$ consists of a flag indicating whether a value is present, followed by the actual value payload if one is present. Each $c_i$ is the encoding of a child node.
|
||||
\item An extension node is encoded as $(\texttt{MPT\_NODE\_EXTENSION}, k, c)$, $k$ represents the part of the key associated with this extension, and is encoded as a 2-tuple $(\texttt{packed\_nibbles}, \texttt{num\_nibbles})$. $c$ is a pointer to a child node.
|
||||
\item A leaf node is encoded as $(\texttt{MPT\_NODE\_LEAF}, k, v)$, where $k$ is a 2-tuple as above, and $v$ is a value payload.
|
||||
\item A digest node is encoded as $(\texttt{MPT\_NODE\_HASH}, d)$, where $d$ is a Keccak256 digest.
|
||||
\end{enumerate}
|
||||
Nodes are thus given in depth-first order, enabling natural recursive methods for encoding and decoding this format.
|
||||
The payload of state and receipt tries is given in the natural sequential way. The transaction an receipt payloads contain variable size data, thus the input is slightly different. The prover input for for the transactions is the transaction RLP encoding preceeded by its lenght. For the receipts is in the natural sequential way, except that topics and data are preceeded by their lengths, respectively.
|
||||
|
||||
\subsection{Encoding and Hashing}
|
||||
|
||||
Encoding is recursively performed starting from the trie root. Leaf, branch and extension nodes are encoded as the RLP encoding of list containing the hex prefix encoding of the node key as well as
|
||||
|
||||
\begin{description}
|
||||
\item[Leaf Node:] the encoding of the the payload,
|
||||
\item[Branch Node:] the hash or encoding of the 16 children and the encoding of the payload,
|
||||
\item[Extension Node:] the hash or encoding of the child and the encoding of the payload.
|
||||
\end{description}
|
||||
For the rest of the nodes we have:
|
||||
\begin{description}
|
||||
\item[Empty Node:] the encoding of an empty node is {\tt 0x80},
|
||||
\item[Digest Node:] the encoding of a digest node stored as $({\tt MPT\_HASH\_NODE}, d)$ is $d$.
|
||||
\end{description}
|
||||
|
||||
The payloads in turn are RLP encoded as follows
|
||||
\begin{description}
|
||||
\item[State Trie:] Encoded as a list containing nonce, balance, storage trie hash and code hash.
|
||||
\item[Storage Trie:] The RLP encoding of the value (thus the double RLP encoding)
|
||||
\item[Transaction Trie:] The RLP encoded transaction.
|
||||
\item[Receipt Trie:] Depending on the transaction type it's encoded as ${\sf RLP}({\sf RLP}({\tt receipt}))$ for Legacy transactions or ${\sf RLP}({\tt txn\_type}||{\sf RLP}({\tt receipt}))$ for transactions of type 1 or 2. Each receipt is encoded as a list containing:
|
||||
\begin{enumerate}
|
||||
\item the status,
|
||||
\item the cumulative gas used,
|
||||
\item the bloom filter, stored as a list of length 256.
|
||||
\item the list of topics
|
||||
\item the data string.
|
||||
\end{enumerate}
|
||||
\end{description}
|
||||
|
||||
Once a node is encoded is written on the {\tt Segment::RlpRaw} segment as a sequence of bytes. Then the RLP encoded data is hashed if the length of the data is more than 32 bytes. Otherwise we return the encoding. Further details can be found in the \href{https://github.com/0xPolygonZero/plonky2/tree/main/evm/src/cpu/mpt/hash}{mpt hash folder}.
|
||||
Binary file not shown.
Loading…
x
Reference in New Issue
Block a user