mirror of
https://github.com/status-im/EIPs.git
synced 2025-03-01 07:00:36 +00:00
Update EIP-1057 to match current ProgPoW spec
This commit is contained in:
parent
7c2253ee9a
commit
bf4566e3e2
202
EIPS/eip-1057.md
202
EIPS/eip-1057.md
@ -1,7 +1,7 @@
|
|||||||
---
|
---
|
||||||
eip: 1057
|
eip: 1057
|
||||||
title: ProgPoW, a Programmatic Proof-of-Work
|
title: ProgPoW, a Programmatic Proof-of-Work
|
||||||
author: Radix Pi <radix.pi.314@gmail.com>, Ifdef Else <ifdefelse@protonmail.com>
|
author: IfDefElse <ifdefelse@protonmail.com>
|
||||||
discussions-to: https://ethereum-magicians.org/t/eip-progpow-a-programmatic-proof-of-work/272
|
discussions-to: https://ethereum-magicians.org/t/eip-progpow-a-programmatic-proof-of-work/272
|
||||||
status: Draft
|
status: Draft
|
||||||
type: Standards Track
|
type: Standards Track
|
||||||
@ -59,16 +59,20 @@ In contrast to Ethash, the changes detailed below make ProgPoW dependent on the
|
|||||||
|
|
||||||
*The DRAM read from the DAG is the same as Ethash’s, but with the size increased to `256 bytes`. This better matches the workloads seen on commodity GPUs, preventing a specialized ASIC from being able to gain performance by optimizing the memory controller for abnormally small accesses.*
|
*The DRAM read from the DAG is the same as Ethash’s, but with the size increased to `256 bytes`. This better matches the workloads seen on commodity GPUs, preventing a specialized ASIC from being able to gain performance by optimizing the memory controller for abnormally small accesses.*
|
||||||
|
|
||||||
The DAG file is generated according to traditional Ethash specifications, with an additional `PROGPOW_SIZE_CACHE` bytes generated that will be cached in the L1.
|
The DAG file is generated according to traditional Ethash specifications.
|
||||||
|
|
||||||
ProgPoW can be tuned using the following parameters. The proposed settings have been tuned for a range of existing, commodity GPUs:
|
ProgPoW can be tuned using the following parameters. The proposed settings have been tuned for a range of existing, commodity GPUs:
|
||||||
|
|
||||||
* `PROGPOW_LANES:` The number of parallel lanes that coordinate to calculate a single hash instance; default is `32.`
|
* `PROGPOW_PERIOD`: Number of blocks before changing the random program; default is `50`.
|
||||||
* `PROGPOW_REGS:` The register file usage size; default is `16.`
|
* `PROGPOW_LANES`: The number of parallel lanes that coordinate to calculate a single hash instance; default is `16`.
|
||||||
* `PROGPOW_CACHE_BYTES:` The size of the cache; default is `16 x 1024.`
|
* `PROGPOW_REGS`: The register file usage size; default is `32`.
|
||||||
* `PROGPOW_CNT_MEM:` The number of frame buffer accesses, defined as the outer loop of the algorithm; default is `64` (same as Ethash).
|
* `PROGPOW_DAG_LOADS`: Number of uint32 loads from the DAG per lane; default is `4`;
|
||||||
* `PROGPOW_CNT_CACHE:` The number of cache accesses per loop; default is `8.`
|
* `PROGPOW_CACHE_BYTES`: The size of the cache; default is `16 x 1024`.
|
||||||
* `PROGPOW_CNT_MATH:` The number of math operations per loop; default is `8.`
|
* `PROGPOW_CNT_DAG`: The number of DAG accesses, defined as the outer loop of the algorithm; default is `64` (same as Ethash).
|
||||||
|
* `PROGPOW_CNT_CACHE`: The number of cache accesses per loop; default is `12`.
|
||||||
|
* `PROGPOW_CNT_MATH`: The number of math operations per loop; default is `20`.
|
||||||
|
|
||||||
|
The random program changes every `PROGPOW_PERIOD` blocks (default `50`, roughly 12.5 minutes) to ensure the hardware executing the algorithm is fully programmable. If the program only changed every DAG epoch (roughly 5 days) certain miners could have time to develop hand-optimized versions of the random sequence, giving them an undue advantage.
|
||||||
|
|
||||||
ProgPoW uses **FNV1a** for merging data. The existing Ethash uses FNV1 for merging, but FNV1a provides better distribution properties.
|
ProgPoW uses **FNV1a** for merging data. The existing Ethash uses FNV1 for merging, but FNV1a provides better distribution properties.
|
||||||
|
|
||||||
@ -90,12 +94,14 @@ typedef struct {
|
|||||||
// http://www.cse.yorku.ca/~oz/marsaglia-rng.html
|
// http://www.cse.yorku.ca/~oz/marsaglia-rng.html
|
||||||
uint32_t kiss99(kiss99_t &st)
|
uint32_t kiss99(kiss99_t &st)
|
||||||
{
|
{
|
||||||
uint32_t znew = (st.z = 36969 * (st.z & 65535) + (st.z >> 16));
|
st.z = 36969 * (st.z & 65535) + (st.z >> 16);
|
||||||
uint32_t wnew = (st.w = 18000 * (st.w & 65535) + (st.w >> 16));
|
st.w = 18000 * (st.w & 65535) + (st.w >> 16);
|
||||||
uint32_t MWC = ((znew << 16) + wnew);
|
uint32_t MWC = ((st.z << 16) + st.w);
|
||||||
uint32_t SHR3 = (st.jsr ^= (st.jsr << 17), st.jsr ^= (st.jsr >> 13), st.jsr ^= (st.jsr << 5));
|
st.jsr ^= (st.jsr << 17);
|
||||||
uint32_t CONG = (st.jcong = 69069 * st.jcong + 1234567);
|
st.jsr ^= (st.jsr >> 13);
|
||||||
return ((MWC^CONG) + SHR3);
|
st.jsr ^= (st.jsr << 5);
|
||||||
|
st.jcong = 69069 * st.jcong + 1234567;
|
||||||
|
return ((MWC^st.jcong) + st.jsr);
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
@ -121,59 +127,89 @@ void fill_mix(
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
The main search algorithm uses the Keccak sponge function (a width of 800 bits, with a bitrate of 448, and a capacity of 352) to generate a seed, expands the seed, does a sequence of loads and random math on the mix data, and then compresses the result into a final Keccak permutation (with the same parameters as the first) for target comparison.
|
Like ethash Keccak is used to seed the sequence per-nonce and to produce the final result. The keccak-f800 variant is used as the 32-bit word size matches the native word size of modern GPUs. The implementation is a variant of SHAKE with width=800, bitrate=576, capacity=224, output=256, and no padding. The result of keccak is treated as a 256-bit big-endian number - that is result byte 0 is the MSB of the value.
|
||||||
|
|
||||||
```cpp
|
```cpp
|
||||||
|
hash32_t keccak_f800_progpow(hash32_t header, uint64_t seed, hash32_t digest)
|
||||||
|
{
|
||||||
|
uint32_t st[25];
|
||||||
|
|
||||||
|
for (int i = 0; i < 25; i++)
|
||||||
|
st[i] = 0;
|
||||||
|
for (int i = 0; i < 8; i++)
|
||||||
|
st[i] = header.uint32s[i];
|
||||||
|
st[8] = seed;
|
||||||
|
st[9] = seed >> 32;
|
||||||
|
for (int i = 0; i < 8; i++)
|
||||||
|
st[10+i] = digest.uint32s[i];
|
||||||
|
|
||||||
|
for (int r = 0; r < 22; r++)
|
||||||
|
keccak_f800_round(st, r);
|
||||||
|
|
||||||
|
hash32_t ret;
|
||||||
|
for (int i=0; i<8; i++)
|
||||||
|
ret.uint32s[i] = st[i];
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The flow of the overall algorithm is:
|
||||||
|
* A keccak hash of the header + nonce to create a seed
|
||||||
|
* Use the seed to generate initial mix data
|
||||||
|
* Loop multiple times, each time hashing random loads and random math into the mix data
|
||||||
|
* Hash all the mix data into a single 256-bit value
|
||||||
|
* A final keccak hash that is compared against the target
|
||||||
|
|
||||||
|
```cpp
|
||||||
bool progpow_search(
|
bool progpow_search(
|
||||||
const uint64_t prog_seed,
|
const uint64_t prog_seed, // value is (block_number/PROGPOW_PERIOD)
|
||||||
const uint64_t nonce,
|
const uint64_t nonce,
|
||||||
const hash32_t header,
|
const hash32_t header,
|
||||||
const uint64_t target,
|
const hash32_t target, // miner can use a uint64_t target, doesn't need the full 256 bit target
|
||||||
const uint64_t *g_dag, // gigabyte DAG located in framebuffer
|
const uint32_t *dag // gigabyte DAG located in framebuffer - the first portion gets cached
|
||||||
const uint64_t *c_dag // kilobyte DAG located in l1 cache
|
|
||||||
)
|
)
|
||||||
{
|
{
|
||||||
uint32_t mix[PROGPOW_LANES][PROGPOW_REGS];
|
uint32_t mix[PROGPOW_LANES][PROGPOW_REGS];
|
||||||
uint32_t result[4];
|
hash32_t digest;
|
||||||
for (int i = 0; i < 4; i++)
|
for (int i = 0; i < 8; i++)
|
||||||
result[i] = 0;
|
digest.uint32s[i] = 0;
|
||||||
|
|
||||||
// keccak(header..nonce)
|
// keccak(header..nonce)
|
||||||
uint64_t seed = keccak_f800(header, nonce, result);
|
hash32_t seed_256 = keccak_f800_progpow(header, nonce, digest);
|
||||||
|
// endian swap so byte 0 of the hash is the MSB of the value
|
||||||
|
uint64_t seed = bswap(seed_256[0]) << 32 | bswap(seed_256[1]);
|
||||||
|
|
||||||
// initialize mix for all lanes
|
// initialize mix for all lanes
|
||||||
for (int l = 0; l < PROGPOW_LANES; l++)
|
for (int l = 0; l < PROGPOW_LANES; l++)
|
||||||
fill_mix(seed, l, mix);
|
fill_mix(seed, l, mix[l]);
|
||||||
|
|
||||||
// execute the randomly generated inner loop
|
// execute the randomly generated inner loop
|
||||||
for (int i = 0; i < PROGPOW_CNT_MEM; i++)
|
for (int i = 0; i < PROGPOW_CNT_DAG; i++)
|
||||||
progPowLoop(prog_seed, i, mix, g_dag, c_dag);
|
progPowLoop(prog_seed, i, mix, dag);
|
||||||
|
|
||||||
// Reduce mix data to a single per-lane result
|
// Reduce mix data to a per-lane 32-bit digest
|
||||||
uint32_t lane_hash[PROGPOW_LANES];
|
uint32_t digest_lane[PROGPOW_LANES];
|
||||||
for (int l = 0; l < PROGPOW_LANES; l++)
|
for (int l = 0; l < PROGPOW_LANES; l++)
|
||||||
{
|
{
|
||||||
lane_hash[l] = 0x811c9dc5
|
digest_lane[l] = 0x811c9dc5
|
||||||
for (int i = 0; i < PROGPOW_REGS; i++)
|
for (int i = 0; i < PROGPOW_REGS; i++)
|
||||||
fnv1a(lane_hash[l], mix[l][i]);
|
fnv1a(digest_lane[l], mix[l][i]);
|
||||||
}
|
}
|
||||||
// Reduce all lanes to a single 128-bit result
|
// Reduce all lanes to a single 256-bit digest
|
||||||
for (int i = 0; i < 4; i++)
|
for (int i = 0; i < 8; i++)
|
||||||
result[i] = 0x811c9dc5;
|
digest.uint32s[i] = 0x811c9dc5;
|
||||||
for (int l = 0; l < PROGPOW_LANES; l++)
|
for (int l = 0; l < PROGPOW_LANES; l++)
|
||||||
fnv1a(result[l%4], lane_hash[l])
|
fnv1a(digest.uint32s[l%8], digest_lane[l])
|
||||||
|
|
||||||
// keccak(header .. keccak(header..nonce) .. result);
|
// keccak(header .. keccak(header..nonce) .. digest);
|
||||||
return (keccak_f800(header, seed, result) <= target);
|
return (keccak_f800_progpow(header, seed, digest) <= target);
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
The inner loop uses FNV and KISS99 to generate a random sequence from the `prog_seed`. This random sequence determines which mix state is accessed and what random math is performed. Since the `prog_seed` changes relatively infrequently it is expected that `progPowLoop` will be compiled while mining instead of interpreted on the fly.
|
The inner loop uses FNV and KISS99 to generate a random sequence from the `prog_seed`. This random sequence determines which mix state is accessed and what random math is performed. Since the `prog_seed` changes relatively infrequently it is expected that `progPowLoop` will be compiled while mining instead of interpreted on the fly.
|
||||||
|
|
||||||
```cpp
|
```cpp
|
||||||
|
kiss99_t progPowInit(uint64_t prog_seed, int mix_seq_dst[PROGPOW_REGS], int mix_seq_cache[PROGPOW_REGS])
|
||||||
kiss99_t progPowInit(uint64_t prog_seed, int mix_seq[PROGPOW_REGS])
|
|
||||||
{
|
{
|
||||||
kiss99_t prog_rnd;
|
kiss99_t prog_rnd;
|
||||||
uint32_t fnv_hash = 0x811c9dc5;
|
uint32_t fnv_hash = 0x811c9dc5;
|
||||||
@ -181,15 +217,22 @@ kiss99_t progPowInit(uint64_t prog_seed, int mix_seq[PROGPOW_REGS])
|
|||||||
prog_rnd.w = fnv1a(fnv_hash, prog_seed >> 32);
|
prog_rnd.w = fnv1a(fnv_hash, prog_seed >> 32);
|
||||||
prog_rnd.jsr = fnv1a(fnv_hash, prog_seed);
|
prog_rnd.jsr = fnv1a(fnv_hash, prog_seed);
|
||||||
prog_rnd.jcong = fnv1a(fnv_hash, prog_seed >> 32);
|
prog_rnd.jcong = fnv1a(fnv_hash, prog_seed >> 32);
|
||||||
// Create a random sequence of mix destinations for merge()
|
// Create a random sequence of mix destinations for merge() and mix sources for cache reads
|
||||||
// guaranteeing every location is touched once
|
// guarantees every destination merged once
|
||||||
// Uses Fisher–Yates shuffle
|
// guarantees no duplicate cache reads, which could be optimized away
|
||||||
|
// Uses Fisher-Yates shuffle
|
||||||
for (int i = 0; i < PROGPOW_REGS; i++)
|
for (int i = 0; i < PROGPOW_REGS; i++)
|
||||||
mix_seq[i] = i;
|
{
|
||||||
|
mix_seq_dst[i] = i;
|
||||||
|
mix_seq_cache[i] = i;
|
||||||
|
}
|
||||||
for (int i = PROGPOW_REGS - 1; i > 0; i--)
|
for (int i = PROGPOW_REGS - 1; i > 0; i--)
|
||||||
{
|
{
|
||||||
int j = kiss99(prog_rnd) % (i + 1);
|
int j;
|
||||||
swap(mix_seq[i], mix_seq[j]);
|
j = kiss99(prog_rnd) % (i + 1);
|
||||||
|
swap(mix_seq_dst[i], mix_seq_dst[j]);
|
||||||
|
j = kiss99(prog_rnd) % (i + 1);
|
||||||
|
swap(mix_seq_cache[i], mix_seq_cache[j]);
|
||||||
}
|
}
|
||||||
return prog_rnd;
|
return prog_rnd;
|
||||||
}
|
}
|
||||||
@ -241,60 +284,66 @@ The main loop:
|
|||||||
|
|
||||||
```cpp
|
```cpp
|
||||||
// Helper to get the next value in the per-program random sequence
|
// Helper to get the next value in the per-program random sequence
|
||||||
#define rnd() (kiss99(prog_rnd))
|
#define rnd() (kiss99(prog_rnd))
|
||||||
// Helper to pick a random mix location
|
// Helper to pick a random mix location
|
||||||
#define mix_src() (rnd() % PROGPOW_REGS)
|
#define mix_src() (rnd() % PROGPOW_REGS)
|
||||||
// Helper to access the sequence of mix destinations
|
// Helper to access the sequence of mix destinations
|
||||||
#define mix_dst() (mix_seq[(mix_seq_cnt++)%PROGPOW_REGS])
|
#define mix_dst() (mix_seq_dst[(mix_seq_dst_cnt++)%PROGPOW_REGS])
|
||||||
|
// Helper to access the sequence of cache sources
|
||||||
|
#define mix_cache() (mix_seq_cache[(mix_seq_cache_cnt++)%PROGPOW_REGS])
|
||||||
|
|
||||||
void progPowLoop(
|
void progPowLoop(
|
||||||
const uint64_t prog_seed,
|
const uint64_t prog_seed,
|
||||||
const uint32_t loop,
|
const uint32_t loop,
|
||||||
uint32_t mix[PROGPOW_LANES][PROGPOW_REGS],
|
uint32_t mix[PROGPOW_LANES][PROGPOW_REGS],
|
||||||
const uint64_t *g_dag,
|
const uint32_t *dag)
|
||||||
const uint32_t *c_dag)
|
|
||||||
{
|
{
|
||||||
// All lanes share a base address for the global load
|
// All lanes share a base address for the global load
|
||||||
// Global offset uses mix[0] to guarantee it depends on the load result
|
// Global offset uses mix[0] to guarantee it depends on the load result
|
||||||
uint32_t offset_g = mix[loop%PROGPOW_LANES][0] % DAG_SIZE;
|
uint32_t offset_g = mix[loop%PROGPOW_LANES][0] % (DAG_BYTES / (PROGPOW_LANES*PROGPOW_DAG_LOADS*sizeof(uint32_t)));
|
||||||
// Lanes can execute in parallel and will be convergent
|
// Lanes can execute in parallel and will be convergent
|
||||||
for (int l = 0; l < PROGPOW_LANES; l++)
|
for (int l = 0; l < PROGPOW_LANES; l++)
|
||||||
{
|
{
|
||||||
// global load to sequential locations
|
// global load to the 256 byte DAG entry
|
||||||
uint64_t data64 = g_dag[offset_g + l];
|
// every lane can access every part of the entry
|
||||||
|
uint32_t data_g[PROGPOW_DAG_LOADS];
|
||||||
|
uint32_t offset_l = offset_g * PROGPOW_LANES + (l ^ loop) % PROGPOW_LANES;
|
||||||
|
for (int i = 0; i < PROGPOW_DAG_LOADS; i++)
|
||||||
|
data_g[i] = dag[offset_l * PROGPOW_DAG_LOADS + i];
|
||||||
|
|
||||||
// initialize the seed and mix destination sequence
|
// initialize the seed and mix destination sequence
|
||||||
int mix_seq[PROGPOW_REGS];
|
int mix_seq_dst[PROGPOW_REGS];
|
||||||
int mix_seq_cnt = 0;
|
int mix_seq_cache[PROGPOW_REGS];
|
||||||
kiss99_t prog_rnd = progPowInit(prog_seed, mix_seq);
|
int mix_seq_dst_cnt = 0;
|
||||||
|
int mix_seq_cache_cnt = 0;
|
||||||
|
kiss99_t prog_rnd = progPowInit(prog_seed, mix_seq_dst, mix_seq_cache);
|
||||||
|
|
||||||
uint32_t offset, data32;
|
|
||||||
int max_i = max(PROGPOW_CNT_CACHE, PROGPOW_CNT_MATH);
|
int max_i = max(PROGPOW_CNT_CACHE, PROGPOW_CNT_MATH);
|
||||||
for (int i = 0; i < max_i; i++)
|
for (int i = 0; i < max_i; i++)
|
||||||
{
|
{
|
||||||
if (i < PROGPOW_CNT_CACHE)
|
if (i < PROGPOW_CNT_CACHE)
|
||||||
{
|
{
|
||||||
// Cached memory access
|
// Cached memory access
|
||||||
// lanes access random location
|
// lanes access random 32-bit locations within the first portion of the DAG
|
||||||
offset = mix[l][mix_src()] % PROGPOW_CACHE_WORDS;
|
uint32_t offset = mix[l][mix_cache()] % (PROGPOW_CACHE_BYTES/sizeof(uint32_t));
|
||||||
data32 = c_dag[offset];
|
uint32_t data = dag[offset];
|
||||||
merge(mix[l][mix_dst()], data32, rnd());
|
merge(mix[l][mix_dst()], data, rnd());
|
||||||
}
|
}
|
||||||
if (i < PROGPOW_CNT_MATH)
|
if (i < PROGPOW_CNT_MATH)
|
||||||
{
|
{
|
||||||
// Random Math
|
// Random Math
|
||||||
data32 = math(mix[l][mix_src()], mix[l][mix_src()], rnd());
|
uint32_t data = math(mix[l][mix_src()], mix[l][mix_src()], rnd());
|
||||||
merge(mix[l][mix_dst()], data32, rnd());
|
merge(mix[l][mix_dst()], data, rnd());
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
// Consume the global load data at the very end of the loop
|
// Consume the global load data at the very end of the loop to allow full latency hiding
|
||||||
// Allows full latency hiding
|
// Always merge into mix[0] to feed the offset calculation
|
||||||
merge(mix[l][0], data64, rnd());
|
merge(mix[l][0], data_g[0], rnd());
|
||||||
merge(mix[l][mix_dst()], data64>>32, rnd());
|
for (int i = 1; i < PROGPOW_DAG_LOADS; i++)
|
||||||
|
merge(mix[l][mix_dst()], data_g[i], rnd());
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
## Rationale
|
## Rationale
|
||||||
|
|
||||||
ProgPoW utilizes almost all parts of a commodity GPU, excluding:
|
ProgPoW utilizes almost all parts of a commodity GPU, excluding:
|
||||||
@ -308,24 +357,7 @@ Since the GPU is almost fully utilized, there’s little opportunity for specia
|
|||||||
|
|
||||||
## Backwards Compatibility
|
## Backwards Compatibility
|
||||||
|
|
||||||
This algorithm is not backwards compatible with the existing Ethash, and will require a fork for adoption. Furthermore, the network hashrate will halve as the time spent in the core is now balanced with time spent in memory.
|
This algorithm is not backwards compatible with the existing Ethash, and will require a fork for adoption. Furthermore, the network hashrate will halve since twice as much memory is loaded per hash.
|
||||||
|
|
||||||
## Test Cases
|
|
||||||
|
|
||||||
This PoW algorithm was tested against six different models from two different manufacturers. Selected models span two different chips and memory types from each manufacturer (Polaris20-GDDR5 and Vega10-HBM2 for AMD; GP104-GDDR5 and GP102-GDDR5X for NVIDIA). The average hashrate results are listed below. Additional tests are ongoing.
|
|
||||||
|
|
||||||
As the algorithm nearly fully utilizes GPU functions in a natural way, the results reflect relative GPU performance that is similar to other gaming and graphics applications.
|
|
||||||
|
|
||||||
-------------------------------
|
|
||||||
| Model | Hashrate (MH/s) |
|
|
||||||
| --------- | --------------- |
|
|
||||||
| RX580 | 9.4 |
|
|
||||||
| Vega56 | 16.6 |
|
|
||||||
| Vega64 | 18.7 |
|
|
||||||
| GTX1070Ti | 13.1 |
|
|
||||||
| GTX1080 | 14.9 |
|
|
||||||
| GTX1080Ti | 21.8 |
|
|
||||||
-------------------------------
|
|
||||||
|
|
||||||
## Implementation
|
## Implementation
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user