Remove bootstrapping (#1390)

* Start removing bootstrapping * Change the constraint for kernel code initializing * Update specs * Apply comments * Add new global metadata to circuit methods * Change zero-initializing constraint * Apply comment * Update circuit size range for recursive test
2026-02-17 04:13:35 +00:00 · 2023-11-30 10:04:08 -05:00 · 2023-11-30 10:04:08 -05:00 · 30c944f778
commit 30c944f778
parent 471ff68d51
19 changed files with 150 additions and 296 deletions
--- a/evm/spec/tables/cpu.tex
+++ b/evm/spec/tables/cpu.tex
@ -7,14 +7,12 @@ This section will only briefly present the CPU and its columns. Details about th

 \subsubsection{CPU flow}

-An execution run can be decomposed into three distinct parts:
+An execution run can be decomposed into two distinct parts:
 \begin{itemize}
-    \item \textbf{Bootstrapping:} The CPU starts by writing all the kernel code to memory and then hashes it. The hash is then compared to a public value shared with
-the verifier to ensure that the kernel code is correct.
    \item \textbf{CPU cycles:} The bulk of the execution. In each row, the CPU reads the current code at the program counter (PC) address, and executes it. The current code can be the kernel code,
 or whichever code is being executed in the current context (transaction code or contract code). Executing an instruction consists in modifying the registers, possibly
 performing some memory operations, and updating the PC.
-\item \textbf{Padding:} At the end of the execution, we need to pad the length of the CPU trace to the next power of two. When the program counter reaches the special halting label
+    \item \textbf{Padding:} At the end of the execution, we need to pad the length of the CPU trace to the next power of two. When the program counter reaches the special halting label
 in the kernel, execution halts. Constraints ensure that every subsequent row is a padding row and that execution cannot resume.
 \end{itemize}

@ -26,7 +24,6 @@ but change the code context, which is where the instructions are read from.
 \subsubsection{CPU columns}

 \paragraph*{Registers:} \begin{itemize}
-    \item \texttt{is\_bootstrap\_kernel}: Boolean indicating whether this is a bootstrapping row or not. It must be 1 at the first row, then switch to 0 until the end.
    \item \texttt{context}: Indicates which context we are in. 0 for the kernel, and a positive integer for every user context. Incremented by 1 at every call.
    \item \texttt{code\_context}: Indicates in which context the code to execute resides. It's equal to \texttt{context} in user mode, but is always 0 in kernel mode.
    \item \texttt{program\_counter}: The address of the instruction to be read and executed.
@ -34,14 +31,13 @@ but change the code context, which is where the instructions are read from.
    \item \texttt{is\_kernel\_mode}: Boolean indicating whether we are in kernel (i.e. privileged) mode. This means we are executing kernel code, and we have access to
 privileged instructions.
    \item \texttt{gas}: The current amount of gas used in the current context. It is eventually checked to be below the current gas limit. Must fit in 32 bits.
-    \item \texttt{is\_keccak\_sponge}: Boolean indicating whether we are executing a Keccak hash. This happens whenever a \texttt{KECCAK\_GENERAL} instruction is executed, or at the last
-cycle of bootstrapping to hash the kernel code.
+    \item \texttt{is\_keccak\_sponge}: Boolean indicating whether we are executing a Keccak hash. Only used as a filter for CTLs.
    \item \texttt{clock}: Monotonic counter which starts at 0 and is incremented by 1 at each row. Used to enforce correct ordering of memory accesses. 
    \item \texttt{opcode\_bits}: 8 boolean columns, which are the bit decomposition of the opcode being read at the current PC.
 \end{itemize}

 \paragraph*{Operation flags:} Boolean flags. During CPU cycles phase, each row executes a single instruction, which sets one and only one operation flag. No flag is set during
-bootstrapping and padding. The decoding constraints ensure that the flag set corresponds to the opcode being read.
+padding. The decoding constraints ensure that the flag set corresponds to the opcode being read.
 There isn't a 1-to-1 correspondance between instructions and flags. For efficiency, the same flag can be set by different, unrelated instructions (e.g. \texttt{eq\_iszero}, which represents
 the \texttt{EQ} and the \texttt{ISZERO} instructions). When there is a need to differentiate them in constraints, we filter them with their respective opcode: since the first bit of \texttt{EQ}'s opcode
 (resp. \texttt{ISZERO}'s opcode) is 0 (resp. 1), we can filter a constraint for an EQ instruction with \texttt{eq\_iszero * (1 - opcode\_bits[0])}
--- a/evm/spec/tables/memory.tex
+++ b/evm/spec/tables/memory.tex
@ -14,9 +14,10 @@ Each row of the memory table corresponds to a single memory operation (a read or
 The memory table should be ordered by $(a, \tau)$. Note that the correctness of the memory could be checked as follows:
 \begin{enumerate}
  \item Verify the ordering by checking that $(a_i, \tau_i) \leq (a_{i+1}, \tau_{i+1})$ for each consecutive pair.
-  \item Enumerate the purportedly-ordered log while tracking the ``current'' value of $v$, which is initially zero.\footnote{EVM memory is zero-initialized.}
+  \item Enumerate the purportedly-ordered log while tracking the ``current'' value of $v$. 
  \begin{enumerate}
-    \item Upon observing an address which doesn't match that of the previous row, if the operation is a read, check that  $v = 0$.
+    \item Upon observing an address which doesn't match that of the previous row, if the address is zero-initialized
+  and if the operation is a read, check that  $v = 0$.
    \item Upon observing a write, don't constrain $v$.
    \item Upon observing a read at timestamp $\tau_i$ which isn't the first operation at this address, check that $v_i = v_{i-1}$.
  \end{enumerate}
@ -64,6 +65,15 @@ Since a memory channel can only hold at most one memory operation, every CPU mem
 Note that it doesn't mean that all memory operations have unique timestamps. There are two exceptions:

 \begin{itemize}
-  \item Before bootstrapping, we write some global metadata in memory. These extra operations are done at timestamp $\tau = 0$.
+  \item Before the CPU cycles, we write some global metadata in memory. These extra operations are done at timestamp $\tau = 0$.
  \item Some tables other than CPU can generate memory operations, like KeccakSponge. When this happens, these operations all have the timestamp of the CPU row of the instruction which invoked the table (for KeccakSponge, KECCAK\_GENERAL).
 \end{itemize}
+
+\subsubsection{Memory initialization}
+
+By default, all memory is zero-initialized. However, to save numerous writes, we allow some specific segments to be initialized with arbitrary values.
+
+\begin{itemize}
+  \item The read-only kernel code (in segment 0, context 0) is initialized with its correct values. It's checked by hashing the segment and verifying
+that the hash value matches a verifier-provided one.
+\end{itemize}
--- a/evm/spec/zkevm.pdf
+++ b/evm/spec/zkevm.pdf
--- a/evm/src/cpu/bootstrap_kernel.rs
+++ b/evm/src/cpu/bootstrap_kernel.rs
@ -1,225 +0,0 @@
-//! The initial phase of execution, where the kernel code is hashed while being written to memory.
-//! The hash is then checked against a precomputed kernel hash.
-
-use ethereum_types::U256;
-use itertools::Itertools;
-use plonky2::field::extension::Extendable;
-use plonky2::field::packed::PackedField;
-use plonky2::field::types::Field;
-use plonky2::hash::hash_types::RichField;
-use plonky2::iop::ext_target::ExtensionTarget;
-use plonky2::plonk::circuit_builder::CircuitBuilder;
-
-use crate::constraint_consumer::{ConstraintConsumer, RecursiveConstraintConsumer};
-use crate::cpu::columns::CpuColumnsView;
-use crate::cpu::kernel::aggregator::KERNEL;
-use crate::cpu::membus::NUM_GP_CHANNELS;
-use crate::generation::state::GenerationState;
-use crate::memory::segments::Segment;
-use crate::witness::memory::MemoryAddress;
-use crate::witness::util::{keccak_sponge_log, mem_write_gp_log_and_fill};
-
-/// Generates the rows to bootstrap the kernel.
-pub(crate) fn generate_bootstrap_kernel<F: Field>(state: &mut GenerationState<F>) {
-    // Iterate through chunks of the code, such that we can write one chunk to memory per row.
-    for chunk in &KERNEL.code.iter().enumerate().chunks(NUM_GP_CHANNELS) {
-        let mut cpu_row = CpuColumnsView::default();
-        cpu_row.clock = F::from_canonical_usize(state.traces.clock());
-        cpu_row.is_bootstrap_kernel = F::ONE;
-
-        // Write this chunk to memory, while simultaneously packing its bytes into a u32 word.
-        for (channel, (addr, &byte)) in chunk.enumerate() {
-            let address = MemoryAddress::new(0, Segment::Code, addr);
-            let write =
-                mem_write_gp_log_and_fill(channel, address, state, &mut cpu_row, byte.into());
-            state.traces.push_memory(write);
-        }
-
-        state.traces.push_cpu(cpu_row);
-    }
-
-    let mut final_cpu_row = CpuColumnsView::default();
-    final_cpu_row.clock = F::from_canonical_usize(state.traces.clock());
-    final_cpu_row.is_bootstrap_kernel = F::ONE;
-    final_cpu_row.is_keccak_sponge = F::ONE;
-    // The Keccak sponge CTL uses memory value columns for its inputs and outputs.
-    final_cpu_row.mem_channels[0].value[0] = F::ZERO; // context
-    final_cpu_row.mem_channels[1].value[0] = F::from_canonical_usize(Segment::Code as usize); // segment
-    final_cpu_row.mem_channels[2].value[0] = F::ZERO; // virt
-    final_cpu_row.mem_channels[3].value[0] = F::from_canonical_usize(KERNEL.code.len()); // len
-
-    // The resulting hash will be written later in mem_channel[0] of the first CPU row, and will be checked
-    // with the CTL.
-    keccak_sponge_log(
-        state,
-        MemoryAddress::new(0, Segment::Code, 0),
-        KERNEL.code.clone(),
-    );
-    state.registers.stack_top = KERNEL
-        .code_hash
-        .iter()
-        .enumerate()
-        .fold(0.into(), |acc, (i, &elt)| {
-            acc + (U256::from(elt) << (224 - 32 * i))
-        });
-    state.traces.push_cpu(final_cpu_row);
-    log::info!("Bootstrapping took {} cycles", state.traces.clock());
-}
-
-/// Evaluates the constraints for kernel bootstrapping.
-pub(crate) fn eval_bootstrap_kernel_packed<F: Field, P: PackedField<Scalar = F>>(
-    local_values: &CpuColumnsView<P>,
-    next_values: &CpuColumnsView<P>,
-    yield_constr: &mut ConstraintConsumer<P>,
-) {
-    // IS_BOOTSTRAP_KERNEL must have an init value of 1, a final value of 0,
-    // and a delta = current.is_bootstrap - next.is_bootstrap in {0, 1}.
-    let local_is_bootstrap = local_values.is_bootstrap_kernel;
-    let next_is_bootstrap = next_values.is_bootstrap_kernel;
-    yield_constr.constraint_first_row(local_is_bootstrap - P::ONES);
-    yield_constr.constraint_last_row(local_is_bootstrap);
-    let delta_is_bootstrap = local_is_bootstrap - next_is_bootstrap;
-    yield_constr.constraint_transition(delta_is_bootstrap * (delta_is_bootstrap - P::ONES));
-
-    // If this is a bootloading row and the i'th memory channel is used, it must have the right
-    // address, name context = 0, segment = Code, virt = clock * NUM_GP_CHANNELS + i.
-    let code_segment = F::from_canonical_usize(Segment::Code as usize);
-    for (i, channel) in local_values.mem_channels.iter().enumerate() {
-        let filter = local_is_bootstrap * channel.used;
-        yield_constr.constraint(filter * channel.addr_context);
-        yield_constr.constraint(filter * (channel.addr_segment - code_segment));
-        let expected_virt = local_values.clock * F::from_canonical_usize(NUM_GP_CHANNELS)
-            + F::from_canonical_usize(i);
-        yield_constr.constraint(filter * (channel.addr_virtual - expected_virt));
-    }
-    yield_constr.constraint(local_is_bootstrap * local_values.partial_channel.used);
-
-    // If this is the final bootstrap row (i.e. delta_is_bootstrap = 1), check that
-    // - all memory channels are disabled
-    // - the current kernel hash matches a precomputed one
-    for channel in local_values.mem_channels.iter() {
-        yield_constr.constraint_transition(delta_is_bootstrap * channel.used);
-    }
-    yield_constr.constraint(delta_is_bootstrap * local_values.partial_channel.used);
-    for (&expected, actual) in KERNEL
-        .code_hash
-        .iter()
-        .rev()
-        .zip(next_values.mem_channels[0].value)
-    {
-        let expected = P::from(F::from_canonical_u32(expected));
-        let diff = expected - actual;
-        yield_constr.constraint_transition(delta_is_bootstrap * diff);
-    }
-
-    // In addition, validate `is_keccak_sponge`. It must be binary, and be 1 either at the final
-    // boostrap row or during a KECCAK_GENERAL instruction. At the final CPU row, it should be 0.
-    yield_constr
-        .constraint(local_values.is_keccak_sponge * (local_values.is_keccak_sponge - P::ONES));
-    yield_constr.constraint_transition(
-        local_values.is_keccak_sponge
-            - (delta_is_bootstrap
-                + local_values.op.jumpdest_keccak_general
-                    * (P::ONES - local_values.opcode_bits[1])),
-    );
-    yield_constr.constraint_last_row(local_values.is_keccak_sponge);
-}
-
-/// Circuit version of `eval_bootstrap_kernel_packed`.
-/// Evaluates the constraints for kernel bootstrapping.
-pub(crate) fn eval_bootstrap_kernel_ext_circuit<F: RichField + Extendable<D>, const D: usize>(
-    builder: &mut CircuitBuilder<F, D>,
-    local_values: &CpuColumnsView<ExtensionTarget<D>>,
-    next_values: &CpuColumnsView<ExtensionTarget<D>>,
-    yield_constr: &mut RecursiveConstraintConsumer<F, D>,
-) {
-    let one = builder.one_extension();
-
-    // IS_BOOTSTRAP_KERNEL must have an init value of 1, a final value of 0,
-    // and a delta = current.is_bootstrap - next.is_bootstrap in {0, 1}.
-    let local_is_bootstrap = local_values.is_bootstrap_kernel;
-    let next_is_bootstrap = next_values.is_bootstrap_kernel;
-    let constraint = builder.sub_extension(local_is_bootstrap, one);
-    yield_constr.constraint_first_row(builder, constraint);
-    yield_constr.constraint_last_row(builder, local_is_bootstrap);
-    let delta_is_bootstrap = builder.sub_extension(local_is_bootstrap, next_is_bootstrap);
-    let constraint =
-        builder.mul_sub_extension(delta_is_bootstrap, delta_is_bootstrap, delta_is_bootstrap);
-    yield_constr.constraint_transition(builder, constraint);
-
-    // If this is a bootloading row and the i'th memory channel is used, it must have the right
-    // address, name context = 0, segment = Code, virt = clock * NUM_GP_CHANNELS + i.
-    let code_segment =
-        builder.constant_extension(F::Extension::from_canonical_usize(Segment::Code as usize));
-    for (i, channel) in local_values.mem_channels.iter().enumerate() {
-        let filter = builder.mul_extension(local_is_bootstrap, channel.used);
-        let constraint = builder.mul_extension(filter, channel.addr_context);
-        yield_constr.constraint(builder, constraint);
-
-        let segment_diff = builder.sub_extension(channel.addr_segment, code_segment);
-        let constraint = builder.mul_extension(filter, segment_diff);
-        yield_constr.constraint(builder, constraint);
-
-        let i_ext = builder.constant_extension(F::Extension::from_canonical_usize(i));
-        let num_gp_channels_f = F::from_canonical_usize(NUM_GP_CHANNELS);
-        let expected_virt =
-            builder.mul_const_add_extension(num_gp_channels_f, local_values.clock, i_ext);
-        let virt_diff = builder.sub_extension(channel.addr_virtual, expected_virt);
-        let constraint = builder.mul_extension(filter, virt_diff);
-        yield_constr.constraint(builder, constraint);
-    }
-    {
-        let constr = builder.mul_extension(local_is_bootstrap, local_values.partial_channel.used);
-        yield_constr.constraint(builder, constr);
-    }
-
-    // If this is the final bootstrap row (i.e. delta_is_bootstrap = 1), check that
-    // - all memory channels are disabled
-    // - the current kernel hash matches a precomputed one
-    for channel in local_values.mem_channels.iter() {
-        let constraint = builder.mul_extension(delta_is_bootstrap, channel.used);
-        yield_constr.constraint_transition(builder, constraint);
-    }
-    {
-        let constr = builder.mul_extension(delta_is_bootstrap, local_values.partial_channel.used);
-        yield_constr.constraint(builder, constr);
-    }
-
-    for (&expected, actual) in KERNEL
-        .code_hash
-        .iter()
-        .rev()
-        .zip(next_values.mem_channels[0].value)
-    {
-        let expected = builder.constant_extension(F::Extension::from_canonical_u32(expected));
-        let diff = builder.sub_extension(expected, actual);
-        let constraint = builder.mul_extension(delta_is_bootstrap, diff);
-        yield_constr.constraint_transition(builder, constraint);
-    }
-
-    // In addition, validate `is_keccak_sponge`. It must be binary, and be 1 either at the final
-    // boostrap row or during a KECCAK_GENERAL instruction. At the final CPU row, it should be 0.
-    {
-        let constr = builder.mul_sub_extension(
-            local_values.is_keccak_sponge,
-            local_values.is_keccak_sponge,
-            local_values.is_keccak_sponge,
-        );
-        yield_constr.constraint(builder, constr);
-    }
-    {
-        let minus_is_keccak_general = builder.mul_sub_extension(
-            local_values.op.jumpdest_keccak_general,
-            local_values.opcode_bits[1],
-            local_values.op.jumpdest_keccak_general,
-        );
-        let computed_is_keccak_sponge =
-            builder.sub_extension(delta_is_bootstrap, minus_is_keccak_general);
-        let constr =
-            builder.sub_extension(local_values.is_keccak_sponge, computed_is_keccak_sponge);
-        yield_constr.constraint_transition(builder, constr);
-    }
-    {
-        yield_constr.constraint_last_row(builder, local_values.is_keccak_sponge);
-    }
-}
--- a/evm/src/cpu/columns/mod.rs
+++ b/evm/src/cpu/columns/mod.rs
@ -53,9 +53,6 @@ pub(crate) struct PartialMemoryChannelView<T: Copy> {
 #[repr(C)]
 #[derive(Clone, Copy, Eq, PartialEq, Debug)]
 pub(crate) struct CpuColumnsView<T: Copy> {
-    /// Filter. 1 if the row is part of bootstrapping the kernel code, 0 otherwise.
-    pub is_bootstrap_kernel: T,
-
    /// If CPU cycle: Current context.
    pub context: T,

--- a/evm/src/cpu/control_flow.rs
+++ b/evm/src/cpu/control_flow.rs
@ -51,7 +51,7 @@ pub(crate) fn eval_packed_generic<P: PackedField>(
    let is_cpu_cycle: P = COL_MAP.op.iter().map(|&col_i| lv[col_i]).sum();
    let is_cpu_cycle_next: P = COL_MAP.op.iter().map(|&col_i| nv[col_i]).sum();

-    let next_halt_state = P::ONES - nv.is_bootstrap_kernel - is_cpu_cycle_next;
+    let next_halt_state = P::ONES - is_cpu_cycle_next;

    // Once we start executing instructions, then we continue until the end of the table
    // or we reach dummy padding rows. This, along with the constraints on the first row,
@ -94,8 +94,7 @@ pub(crate) fn eval_ext_circuit<F: RichField + Extendable<D>, const D: usize>(
    let is_cpu_cycle = builder.add_many_extension(COL_MAP.op.iter().map(|&col_i| lv[col_i]));
    let is_cpu_cycle_next = builder.add_many_extension(COL_MAP.op.iter().map(|&col_i| nv[col_i]));

-    let next_halt_state = builder.add_extension(nv.is_bootstrap_kernel, is_cpu_cycle_next);
-    let next_halt_state = builder.sub_extension(one, next_halt_state);
+    let next_halt_state = builder.sub_extension(one, is_cpu_cycle_next);

    // Once we start executing instructions, then we continue until the end of the table
    // or we reach dummy padding rows. This, along with the constraints on the first row,
--- a/evm/src/cpu/cpu_stark.rs
+++ b/evm/src/cpu/cpu_stark.rs
@ -16,8 +16,8 @@ use crate::all_stark::Table;
 use crate::constraint_consumer::{ConstraintConsumer, RecursiveConstraintConsumer};
 use crate::cpu::columns::{COL_MAP, NUM_CPU_COLUMNS};
 use crate::cpu::{
-    bootstrap_kernel, byte_unpacking, clock, contextops, control_flow, decode, dup_swap, gas,
-    jumps, membus, memio, modfp254, pc, push0, shift, simple_logic, stack, syscalls_exceptions,
+    byte_unpacking, clock, contextops, control_flow, decode, dup_swap, gas, jumps, membus, memio,
+    modfp254, pc, push0, shift, simple_logic, stack, syscalls_exceptions,
 };
 use crate::cross_table_lookup::{Column, TableWithColumns};
 use crate::evaluation_frame::{StarkEvaluationFrame, StarkFrame};
@ -295,7 +295,6 @@ impl<F: RichField + Extendable<D>, const D: usize> Stark<F, D> for CpuStark<F, D
        let next_values: &[P; NUM_CPU_COLUMNS] = vars.get_next_values().try_into().unwrap();
        let next_values: &CpuColumnsView<P> = next_values.borrow();

-        bootstrap_kernel::eval_bootstrap_kernel_packed(local_values, next_values, yield_constr);
        byte_unpacking::eval_packed(local_values, next_values, yield_constr);
        clock::eval_packed(local_values, next_values, yield_constr);
        contextops::eval_packed(local_values, next_values, yield_constr);
@ -331,12 +330,6 @@ impl<F: RichField + Extendable<D>, const D: usize> Stark<F, D> for CpuStark<F, D
            vars.get_next_values().try_into().unwrap();
        let next_values: &CpuColumnsView<ExtensionTarget<D>> = next_values.borrow();

-        bootstrap_kernel::eval_bootstrap_kernel_ext_circuit(
-            builder,
-            local_values,
-            next_values,
-            yield_constr,
-        );
        byte_unpacking::eval_ext_circuit(builder, local_values, next_values, yield_constr);
        clock::eval_ext_circuit(builder, local_values, next_values, yield_constr);
        contextops::eval_ext_circuit(builder, local_values, next_values, yield_constr);
--- a/evm/src/cpu/halt.rs
+++ b/evm/src/cpu/halt.rs
@ -20,8 +20,8 @@ pub(crate) fn eval_packed<P: PackedField>(
    let is_cpu_cycle: P = COL_MAP.op.iter().map(|&col_i| lv[col_i]).sum();
    let is_cpu_cycle_next: P = COL_MAP.op.iter().map(|&col_i| nv[col_i]).sum();

-    let halt_state = P::ONES - lv.is_bootstrap_kernel - is_cpu_cycle;
-    let next_halt_state = P::ONES - nv.is_bootstrap_kernel - is_cpu_cycle_next;
+    let halt_state = P::ONES - is_cpu_cycle;
+    let next_halt_state = P::ONES - is_cpu_cycle_next;

    // The halt flag must be boolean.
    yield_constr.constraint(halt_state * (halt_state - P::ONES));
@ -61,10 +61,8 @@ pub(crate) fn eval_ext_circuit<F: RichField + Extendable<D>, const D: usize>(
    let is_cpu_cycle = builder.add_many_extension(COL_MAP.op.iter().map(|&col_i| lv[col_i]));
    let is_cpu_cycle_next = builder.add_many_extension(COL_MAP.op.iter().map(|&col_i| nv[col_i]));

-    let halt_state = builder.add_extension(lv.is_bootstrap_kernel, is_cpu_cycle);
-    let halt_state = builder.sub_extension(one, halt_state);
-    let next_halt_state = builder.add_extension(nv.is_bootstrap_kernel, is_cpu_cycle_next);
-    let next_halt_state = builder.sub_extension(one, next_halt_state);
+    let halt_state = builder.sub_extension(one, is_cpu_cycle);
+    let next_halt_state = builder.sub_extension(one, is_cpu_cycle_next);

    // The halt flag must be boolean.
    let constr = builder.mul_sub_extension(halt_state, halt_state, halt_state);
--- a/evm/src/cpu/kernel/asm/main.asm
+++ b/evm/src/cpu/kernel/asm/main.asm
@ -1,5 +1,17 @@
 global main:
-    // First, initialise the shift table
+    // First, hash the kernel code
+    %mload_global_metadata(@GLOBAL_METADATA_KERNEL_LEN)
+    PUSH 0
+    PUSH 0
+    PUSH 0
+    // stack: context, segment, virt, len
+    KECCAK_GENERAL
+    // stack: hash
+    %mload_global_metadata(@GLOBAL_METADATA_KERNEL_HASH)
+    // stack: expected_hash, hash
+    %assert_eq
+
+    // Initialise the shift table
    %shift_table_init

    // Initialize the block bloom filter
--- a/evm/src/cpu/kernel/assembler.rs
+++ b/evm/src/cpu/kernel/assembler.rs
@ -2,7 +2,7 @@ use std::collections::HashMap;
 use std::fs;
 use std::time::Instant;

-use ethereum_types::U256;
+use ethereum_types::{H256, U256};
 use itertools::{izip, Itertools};
 use keccak_hash::keccak;
 use log::debug;
@ -26,9 +26,8 @@ pub(crate) const BYTES_PER_OFFSET: u8 = 3;
 pub struct Kernel {
    pub(crate) code: Vec<u8>,

-    /// Computed using `hash_kernel`. It is encoded as `u32` limbs for convenience, since we deal
-    /// with `u32` limbs in our Keccak table.
-    pub(crate) code_hash: [u32; 8],
+    /// Computed using `hash_kernel`.
+    pub(crate) code_hash: H256,

    pub(crate) global_labels: HashMap<String, usize>,
    pub(crate) ordered_labels: Vec<String>,
@ -43,11 +42,7 @@ impl Kernel {
        global_labels: HashMap<String, usize>,
        prover_inputs: HashMap<usize, ProverInputFn>,
    ) -> Self {
-        let code_hash_bytes = keccak(&code).0;
-        let code_hash_be = core::array::from_fn(|i| {
-            u32::from_le_bytes(core::array::from_fn(|j| code_hash_bytes[i * 4 + j]))
-        });
-        let code_hash = code_hash_be.map(u32::from_be);
+        let code_hash = keccak(&code);
        let ordered_labels = global_labels
            .keys()
            .cloned()
--- a/evm/src/cpu/kernel/constants/global_metadata.rs
+++ b/evm/src/cpu/kernel/constants/global_metadata.rs
@ -85,10 +85,13 @@ pub(crate) enum GlobalMetadata {
    LogsPayloadLen = 43,
    TxnNumberBefore = 44,
    TxnNumberAfter = 45,
+
+    KernelHash = 46,
+    KernelLen = 47,
 }

 impl GlobalMetadata {
-    pub(crate) const COUNT: usize = 46;
+    pub(crate) const COUNT: usize = 48;

    pub(crate) fn all() -> [Self; Self::COUNT] {
        [
@ -138,6 +141,8 @@ impl GlobalMetadata {
            Self::BlockCurrentHash,
            Self::TxnNumberBefore,
            Self::TxnNumberAfter,
+            Self::KernelHash,
+            Self::KernelLen,
        ]
    }

@ -190,6 +195,8 @@ impl GlobalMetadata {
            Self::LogsPayloadLen => "GLOBAL_METADATA_LOGS_PAYLOAD_LEN",
            Self::TxnNumberBefore => "GLOBAL_METADATA_TXN_NUMBER_BEFORE",
            Self::TxnNumberAfter => "GLOBAL_METADATA_TXN_NUMBER_AFTER",
+            Self::KernelHash => "GLOBAL_METADATA_KERNEL_HASH",
+            Self::KernelLen => "GLOBAL_METADATA_KERNEL_LEN",
        }
    }
 }
--- a/evm/src/cpu/membus.rs
+++ b/evm/src/cpu/membus.rs
@ -39,9 +39,6 @@ pub(crate) fn eval_packed<P: PackedField>(
 ) {
    // Validate `lv.code_context`.
    // It should be 0 if in kernel mode and `lv.context` if in user mode.
-    // Note: This doesn't need to be filtered to CPU cycles, as this should also be satisfied
-    // during Kernel bootstrapping.
-
    yield_constr.constraint(lv.code_context - (P::ONES - lv.is_kernel_mode) * lv.context);

    // Validate `channel.used`. It should be binary.
@ -62,8 +59,6 @@ pub(crate) fn eval_ext_circuit<F: RichField + Extendable<D>, const D: usize>(
 ) {
    // Validate `lv.code_context`.
    // It should be 0 if in kernel mode and `lv.context` if in user mode.
-    // Note: This doesn't need to be filtered to CPU cycles, as this should also be satisfied
-    // during Kernel bootstrapping.
    let diff = builder.sub_extension(lv.context, lv.code_context);
    let constr = builder.mul_sub_extension(lv.is_kernel_mode, lv.context, diff);
    yield_constr.constraint(builder, constr);
--- a/evm/src/cpu/mod.rs
+++ b/evm/src/cpu/mod.rs
@ -1,4 +1,3 @@
-pub(crate) mod bootstrap_kernel;
 mod byte_unpacking;
 mod clock;
 pub(crate) mod columns;
--- a/evm/src/generation/mod.rs
+++ b/evm/src/generation/mod.rs
@ -3,6 +3,7 @@ use std::collections::HashMap;
 use anyhow::anyhow;
 use eth_trie_utils::partial_trie::{HashedPartialTrie, PartialTrie};
 use ethereum_types::{Address, BigEndianHash, H256, U256};
+use itertools::enumerate;
 use plonky2::field::extension::Extendable;
 use plonky2::field::polynomial::PolynomialValues;
 use plonky2::hash::hash_types::RichField;
@ -16,7 +17,6 @@ use GlobalMetadata::{

 use crate::all_stark::{AllStark, NUM_TABLES};
 use crate::config::StarkConfig;
-use crate::cpu::bootstrap_kernel::generate_bootstrap_kernel;
 use crate::cpu::columns::CpuColumnsView;
 use crate::cpu::kernel::aggregator::KERNEL;
 use crate::cpu::kernel::constants::global_metadata::GlobalMetadata;
@ -149,6 +149,8 @@ fn apply_metadata_and_tries_memops<F: RichField + Extendable<D>, const D: usize>
            GlobalMetadata::ReceiptTrieRootDigestAfter,
            h2u(trie_roots_after.receipts_root),
        ),
+        (GlobalMetadata::KernelHash, h2u(KERNEL.code_hash)),
+        (GlobalMetadata::KernelLen, KERNEL.code.len().into()),
    ];

    let channel = MemoryChannel::GeneralPurpose(0);
@ -216,6 +218,19 @@ fn apply_metadata_and_tries_memops<F: RichField + Extendable<D>, const D: usize>
    state.traces.memory_ops.extend(ops);
 }

+fn initialize_kernel_code<F: RichField + Extendable<D>, const D: usize>(
+    state: &mut GenerationState<F>,
+) {
+    for (i, &byte) in enumerate(KERNEL.code.iter()) {
+        let address = MemoryAddress {
+            context: 0,
+            segment: Segment::Code as usize,
+            virt: i,
+        };
+        state.memory.set(address, byte.into());
+    }
+}
+
 pub fn generate_traces<F: RichField + Extendable<D>, const D: usize>(
    all_stark: &AllStark<F, D>,
    inputs: GenerationInputs,
@ -231,7 +246,7 @@ pub fn generate_traces<F: RichField + Extendable<D>, const D: usize>(

    apply_metadata_and_tries_memops(&mut state, &inputs);

-    generate_bootstrap_kernel::<F>(&mut state);
+    initialize_kernel_code(&mut state);

    timed!(timing, "simulate CPU", simulate_cpu(&mut state)?);

--- a/evm/src/memory/columns.rs
+++ b/evm/src/memory/columns.rs
@ -35,8 +35,12 @@ pub(crate) const CONTEXT_FIRST_CHANGE: usize = VALUE_START + VALUE_LIMBS;
 pub(crate) const SEGMENT_FIRST_CHANGE: usize = CONTEXT_FIRST_CHANGE + 1;
 pub(crate) const VIRTUAL_FIRST_CHANGE: usize = SEGMENT_FIRST_CHANGE + 1;

+// Used to lower the degree of the zero-initializing constraints.
+// Contains `next_segment * addr_changed * next_is_read`.
+pub(crate) const INITIALIZE_AUX: usize = VIRTUAL_FIRST_CHANGE + 1;
+
 // We use a range check to enforce the ordering.
-pub(crate) const RANGE_CHECK: usize = VIRTUAL_FIRST_CHANGE + 1;
+pub(crate) const RANGE_CHECK: usize = INITIALIZE_AUX + 1;
 /// The counter column (used for the range check) starts from 0 and increments.
 pub(crate) const COUNTER: usize = RANGE_CHECK + 1;
 /// The frequencies column used in logUp.
--- a/evm/src/memory/memory_stark.rs
+++ b/evm/src/memory/memory_stark.rs
@ -19,8 +19,8 @@ use crate::evaluation_frame::{StarkEvaluationFrame, StarkFrame};
 use crate::lookup::Lookup;
 use crate::memory::columns::{
    value_limb, ADDR_CONTEXT, ADDR_SEGMENT, ADDR_VIRTUAL, CONTEXT_FIRST_CHANGE, COUNTER, FILTER,
-    FREQUENCIES, IS_READ, NUM_COLUMNS, RANGE_CHECK, SEGMENT_FIRST_CHANGE, TIMESTAMP,
-    VIRTUAL_FIRST_CHANGE,
+    FREQUENCIES, INITIALIZE_AUX, IS_READ, NUM_COLUMNS, RANGE_CHECK, SEGMENT_FIRST_CHANGE,
+    TIMESTAMP, VIRTUAL_FIRST_CHANGE,
 };
 use crate::memory::VALUE_LIMBS;
 use crate::stark::Stark;
@ -92,6 +92,7 @@ pub(crate) fn generate_first_change_flags_and_rc<F: RichField>(
        let next_segment = next_row[ADDR_SEGMENT];
        let next_virt = next_row[ADDR_VIRTUAL];
        let next_timestamp = next_row[TIMESTAMP];
+        let next_is_read = next_row[IS_READ];

        let context_changed = context != next_context;
        let segment_changed = segment != next_segment;
@ -122,6 +123,10 @@ pub(crate) fn generate_first_change_flags_and_rc<F: RichField>(
            "Range check of {} is too large. Bug in fill_gaps?",
            row[RANGE_CHECK]
        );
+
+        let address_changed =
+            row[CONTEXT_FIRST_CHANGE] + row[SEGMENT_FIRST_CHANGE] + row[VIRTUAL_FIRST_CHANGE];
+        row[INITIALIZE_AUX] = next_segment * address_changed * next_is_read;
    }
 }

@ -325,15 +330,26 @@ impl<F: RichField + Extendable<D>, const D: usize> Stark<F, D> for MemoryStark<F
            + address_unchanged * (next_timestamp - timestamp);
        yield_constr.constraint_transition(range_check - computed_range_check);

-        // Enumerate purportedly-ordered log.
-        // We assume that memory is initialized with 0. This means that if the first operation of a new address
-        // is a read, then its value must be 0.
+        // Validate initialize_aux. It contains next_segment * addr_changed * next_is_read.
+        let initialize_aux = local_values[INITIALIZE_AUX];
+        yield_constr.constraint_transition(
+            initialize_aux - next_addr_segment * not_address_unchanged * next_is_read,
+        );
+
        for i in 0..8 {
+            // Enumerate purportedly-ordered log.
            yield_constr.constraint_transition(
                next_is_read * address_unchanged * (next_values_limbs[i] - value_limbs[i]),
            );
+            // By default, memory is initialized with 0. This means that if the first operation of a new address is a read,
+            // then its value must be 0.
+            // There are exceptions, though: this constraint zero-initializes everything but the code segment and context 0.
            yield_constr
-                .constraint_transition(next_is_read * not_address_unchanged * next_values_limbs[i]);
+                .constraint_transition(next_addr_context * initialize_aux * next_values_limbs[i]);
+            // We don't want to exclude the entirety of context 0. This constraint zero-initializes all segments except the
+            // specified ones (segment 0 is already included in initialize_aux).
+            // There is overlap with the previous constraint, but this is not a problem.
+            yield_constr.constraint_transition(initialize_aux * next_values_limbs[i]);
        }

        // Check the range column: First value must be 0,
@ -458,18 +474,33 @@ impl<F: RichField + Extendable<D>, const D: usize> Stark<F, D> for MemoryStark<F
        let range_check_diff = builder.sub_extension(range_check, computed_range_check);
        yield_constr.constraint_transition(builder, range_check_diff);

-        // Enumerate purportedly-ordered log.
-        // We assume that memory is initialized with 0. This means that if the first operation of a new address
-        // is a read, then its value must be 0.
+        // Validate initialize_aux. It contains next_segment * addr_changed * next_is_read.
+        let initialize_aux = local_values[INITIALIZE_AUX];
+        let computed_initialize_aux = builder.mul_extension(not_address_unchanged, next_is_read);
+        let computed_initialize_aux =
+            builder.mul_extension(next_addr_segment, computed_initialize_aux);
+        let new_first_read_constraint =
+            builder.sub_extension(initialize_aux, computed_initialize_aux);
+        yield_constr.constraint_transition(builder, new_first_read_constraint);
+
        for i in 0..8 {
+            // Enumerate purportedly-ordered log.
            let value_diff = builder.sub_extension(next_values_limbs[i], value_limbs[i]);
            let zero_if_read = builder.mul_extension(address_unchanged, value_diff);
            let read_constraint = builder.mul_extension(next_is_read, zero_if_read);
            yield_constr.constraint_transition(builder, read_constraint);
-            let first_read_value =
-                builder.mul_extension(next_values_limbs[i], not_address_unchanged);
-            let first_read_constraint = builder.mul_extension(first_read_value, next_is_read);
-            yield_constr.constraint_transition(builder, first_read_constraint);
+            // By default, memory is initialized with 0. This means that if the first operation of a new address is a read,
+            // then its value must be 0.
+            // There are exceptions, though: this constraint zero-initializes everything but the code segment and context 0.
+            let context_zero_initializing_constraint =
+                builder.mul_extension(next_values_limbs[i], initialize_aux);
+            let initializing_constraint =
+                builder.mul_extension(next_addr_context, context_zero_initializing_constraint);
+            yield_constr.constraint_transition(builder, initializing_constraint);
+            // We don't want to exclude the entirety of context 0. This constraint zero-initializes all segments except the
+            // specified ones (segment 0 is already included in initialize_aux).
+            // There is overlap with the previous constraint, but this is not a problem.
+            yield_constr.constraint_transition(builder, context_zero_initializing_constraint);
        }

        // Check the range column: First value must be 0,
--- a/evm/src/recursive_verifier.rs
+++ b/evm/src/recursive_verifier.rs
@ -1,7 +1,8 @@
+use std::array::from_fn;
 use std::fmt::Debug;

 use anyhow::Result;
-use ethereum_types::{BigEndianHash, U256};
+use ethereum_types::{BigEndianHash, H256, U256};
 use plonky2::field::extension::Extendable;
 use plonky2::field::types::Field;
 use plonky2::fri::witness_util::set_fri_proof_target;
@ -28,6 +29,7 @@ use plonky2_util::log2_ceil;
 use crate::all_stark::Table;
 use crate::config::StarkConfig;
 use crate::constraint_consumer::RecursiveConstraintConsumer;
+use crate::cpu::kernel::aggregator::KERNEL;
 use crate::cpu::kernel::constants::global_metadata::GlobalMetadata;
 use crate::cross_table_lookup::{
    CrossTableLookup, CtlCheckVarsTarget, GrandProductChallenge, GrandProductChallengeSet,
@ -43,7 +45,7 @@ use crate::proof::{
    TrieRootsTarget,
 };
 use crate::stark::Stark;
-use crate::util::{h256_limbs, u256_limbs, u256_to_u32, u256_to_u64};
+use crate::util::{h256_limbs, h2u, u256_limbs, u256_to_u32, u256_to_u64};
 use crate::vanishing_poly::eval_vanishing_poly_circuit;
 use crate::witness::errors::ProgramError;

@ -602,6 +604,27 @@ pub(crate) fn get_memory_extra_looking_products_circuit<
        );
    });

+    // Add kernel hash and kernel length.
+    let kernel_hash_limbs = h256_limbs::<F>(KERNEL.code_hash);
+    let kernel_hash_targets: [Target; 8] = from_fn(|i| builder.constant(kernel_hash_limbs[i]));
+    product = add_data_write(
+        builder,
+        challenge,
+        product,
+        metadata_segment,
+        GlobalMetadata::KernelHash as usize,
+        &kernel_hash_targets,
+    );
+    let kernel_len_target = builder.constant(F::from_canonical_usize(KERNEL.code.len()));
+    product = add_data_write(
+        builder,
+        challenge,
+        product,
+        metadata_segment,
+        GlobalMetadata::KernelLen as usize,
+        &[kernel_len_target],
+    );
+
    product
 }

--- a/evm/src/verifier.rs
+++ b/evm/src/verifier.rs
@ -13,6 +13,7 @@ use plonky2::plonk::plonk_common::reduce_with_powers;
 use crate::all_stark::{AllStark, Table, NUM_TABLES};
 use crate::config::StarkConfig;
 use crate::constraint_consumer::ConstraintConsumer;
+use crate::cpu::kernel::aggregator::KERNEL;
 use crate::cpu::kernel::constants::global_metadata::GlobalMetadata;
 use crate::cross_table_lookup::{
    verify_cross_table_lookups, CtlCheckVars, GrandProductChallenge, GrandProductChallengeSet,
@ -141,8 +142,8 @@ where
 }

 /// Computes the extra product to multiply to the looked value. It contains memory operations not in the CPU trace:
-/// - block metadata writes before kernel bootstrapping,
-/// - trie roots writes before kernel bootstrapping.
+/// - block metadata writes,
+/// - trie roots writes.
 pub(crate) fn get_memory_extra_looking_products<F, const D: usize>(
    public_values: &PublicValues,
    challenge: GrandProductChallenge<F>,
@ -234,6 +235,8 @@ where
            GlobalMetadata::ReceiptTrieRootDigestAfter,
            h2u(public_values.trie_roots_after.receipts_root),
        ),
+        (GlobalMetadata::KernelHash, h2u(KERNEL.code_hash)),
+        (GlobalMetadata::KernelLen, KERNEL.code.len().into()),
    ];

    let segment = F::from_canonical_u32(Segment::GlobalMetadata as u32);
@ -549,6 +552,8 @@ pub(crate) mod testutils {
                GlobalMetadata::ReceiptTrieRootDigestAfter,
                h2u(public_values.trie_roots_after.receipts_root),
            ),
+            (GlobalMetadata::KernelHash, h2u(KERNEL.code_hash)),
+            (GlobalMetadata::KernelLen, KERNEL.code.len().into()),
        ];

        let segment = F::from_canonical_u32(Segment::GlobalMetadata as u32);
--- a/evm/tests/empty_txn_list.rs
+++ b/evm/tests/empty_txn_list.rs
@ -82,7 +82,7 @@ fn test_empty_txn_list() -> anyhow::Result<()> {
    // that is wrong for testing purposes, see below.
    let mut all_circuits = AllRecursiveCircuits::<F, C, D>::new(
        &all_stark,
-        &[16..17, 10..11, 12..13, 14..15, 9..11, 12..13, 18..19], // Minimal ranges to prove an empty list
+        &[16..17, 10..11, 11..12, 14..15, 9..11, 12..13, 17..18], // Minimal ranges to prove an empty list
        &config,
    );

@ -124,7 +124,7 @@ fn test_empty_txn_list() -> anyhow::Result<()> {
    // We pass an empty range if we don't want to add different table sizes.
    all_circuits.expand(
        &all_stark,
-        &[0..0, 0..0, 15..16, 0..0, 0..0, 0..0, 0..0],
+        &[0..0, 0..0, 12..13, 0..0, 0..0, 0..0, 0..0],
        &StarkConfig::standard_fast_config(),
    );