mirror of
https://github.com/status-im/EIPs.git
synced 2025-03-01 07:00:36 +00:00
Update EIP-1057 to match current ProgPoW spec
This commit is contained in:
parent
7c2253ee9a
commit
bf4566e3e2
212
EIPS/eip-1057.md
212
EIPS/eip-1057.md
@ -1,7 +1,7 @@
|
||||
---
|
||||
eip: 1057
|
||||
title: ProgPoW, a Programmatic Proof-of-Work
|
||||
author: Radix Pi <radix.pi.314@gmail.com>, Ifdef Else <ifdefelse@protonmail.com>
|
||||
author: IfDefElse <ifdefelse@protonmail.com>
|
||||
discussions-to: https://ethereum-magicians.org/t/eip-progpow-a-programmatic-proof-of-work/272
|
||||
status: Draft
|
||||
type: Standards Track
|
||||
@ -15,11 +15,11 @@ The following is a proposal for an alternate proof-of-work algorithm - **“Prog
|
||||
|
||||
## Abstract
|
||||
|
||||
The security of proof-of-work is built on a fair, randomized lottery where miners with similar resources have a similar chance of generating the next block.
|
||||
The security of proof-of-work is built on a fair, randomized lottery where miners with similar resources have a similar chance of generating the next block.
|
||||
|
||||
For Ethereum - a community based on widely distributed commodity hardware - specialized ASICs enable certain participants to gain a much greater chance of generating the next block, and undermine the distributed security.
|
||||
|
||||
ASIC-resistance is a misunderstood problem. FPGAs, GPUs and CPUs can themselves be considered ASICs. Any algorithm that executes on a commodity ASIC can have a specialized ASIC made for it; most existing algorithms provide opportunities that reduce power usage and cost. Thus, the proper question to ask when solving ASIC-resistance is “how much more efficient will a specialized ASIC be, in comparison with commodity hardware?”
|
||||
ASIC-resistance is a misunderstood problem. FPGAs, GPUs and CPUs can themselves be considered ASICs. Any algorithm that executes on a commodity ASIC can have a specialized ASIC made for it; most existing algorithms provide opportunities that reduce power usage and cost. Thus, the proper question to ask when solving ASIC-resistance is “how much more efficient will a specialized ASIC be, in comparison with commodity hardware?”
|
||||
|
||||
EIP<NaN> presents an algorithm that is tuned for commodity GPUs where there is minimal opportunity for ASIC specialization. This prevents specialized ASICs without resorting to a game of whack-a-mole where the network changes algorithms every few months.
|
||||
|
||||
@ -29,7 +29,7 @@ Until Ethereum transitions to a pure proof-of-stake model, proof-of-work will co
|
||||
|
||||
Ethash allows for the creation of an ASIC that is roughly twice as efficient as a commodity GPU. Ethash’s memory accesses are paired with a very small amount of fixed compute. Most of a GPU’s capacity and complexity sits idle, wasting power, while waiting for DRAM accesses. A specialized ASIC can implement a much smaller (and cheaper) compute engine that burns much less power.
|
||||
|
||||
As miner rewards are reduced with Casper FFG, it will remain profitable to mine on a specialized ASIC long after GPUs have exited the network. This will make it easier for an entity that has access to private ASICs to stage a 51% attack on the Ethereum network.
|
||||
As miner rewards are reduced with Casper FFG, it will remain profitable to mine on a specialized ASIC long after GPUs have exited the network. This will make it easier for an entity that has access to private ASICs to stage a 51% attack on the Ethereum network.
|
||||
|
||||
## Specification
|
||||
|
||||
@ -57,18 +57,22 @@ In contrast to Ethash, the changes detailed below make ProgPoW dependent on the
|
||||
|
||||
**Increases the DRAM read from 128 bytes to 256 bytes.**
|
||||
|
||||
*The DRAM read from the DAG is the same as Ethash’s, but with the size increased to `256 bytes`. This better matches the workloads seen on commodity GPUs, preventing a specialized ASIC from being able to gain performance by optimizing the memory controller for abnormally small accesses.*
|
||||
*The DRAM read from the DAG is the same as Ethash’s, but with the size increased to `256 bytes`. This better matches the workloads seen on commodity GPUs, preventing a specialized ASIC from being able to gain performance by optimizing the memory controller for abnormally small accesses.*
|
||||
|
||||
The DAG file is generated according to traditional Ethash specifications, with an additional `PROGPOW_SIZE_CACHE` bytes generated that will be cached in the L1.
|
||||
The DAG file is generated according to traditional Ethash specifications.
|
||||
|
||||
ProgPoW can be tuned using the following parameters. The proposed settings have been tuned for a range of existing, commodity GPUs:
|
||||
|
||||
* `PROGPOW_LANES:` The number of parallel lanes that coordinate to calculate a single hash instance; default is `32.`
|
||||
* `PROGPOW_REGS:` The register file usage size; default is `16.`
|
||||
* `PROGPOW_CACHE_BYTES:` The size of the cache; default is `16 x 1024.`
|
||||
* `PROGPOW_CNT_MEM:` The number of frame buffer accesses, defined as the outer loop of the algorithm; default is `64` (same as Ethash).
|
||||
* `PROGPOW_CNT_CACHE:` The number of cache accesses per loop; default is `8.`
|
||||
* `PROGPOW_CNT_MATH:` The number of math operations per loop; default is `8.`
|
||||
* `PROGPOW_PERIOD`: Number of blocks before changing the random program; default is `50`.
|
||||
* `PROGPOW_LANES`: The number of parallel lanes that coordinate to calculate a single hash instance; default is `16`.
|
||||
* `PROGPOW_REGS`: The register file usage size; default is `32`.
|
||||
* `PROGPOW_DAG_LOADS`: Number of uint32 loads from the DAG per lane; default is `4`;
|
||||
* `PROGPOW_CACHE_BYTES`: The size of the cache; default is `16 x 1024`.
|
||||
* `PROGPOW_CNT_DAG`: The number of DAG accesses, defined as the outer loop of the algorithm; default is `64` (same as Ethash).
|
||||
* `PROGPOW_CNT_CACHE`: The number of cache accesses per loop; default is `12`.
|
||||
* `PROGPOW_CNT_MATH`: The number of math operations per loop; default is `20`.
|
||||
|
||||
The random program changes every `PROGPOW_PERIOD` blocks (default `50`, roughly 12.5 minutes) to ensure the hardware executing the algorithm is fully programmable. If the program only changed every DAG epoch (roughly 5 days) certain miners could have time to develop hand-optimized versions of the random sequence, giving them an undue advantage.
|
||||
|
||||
ProgPoW uses **FNV1a** for merging data. The existing Ethash uses FNV1 for merging, but FNV1a provides better distribution properties.
|
||||
|
||||
@ -90,12 +94,14 @@ typedef struct {
|
||||
// http://www.cse.yorku.ca/~oz/marsaglia-rng.html
|
||||
uint32_t kiss99(kiss99_t &st)
|
||||
{
|
||||
uint32_t znew = (st.z = 36969 * (st.z & 65535) + (st.z >> 16));
|
||||
uint32_t wnew = (st.w = 18000 * (st.w & 65535) + (st.w >> 16));
|
||||
uint32_t MWC = ((znew << 16) + wnew);
|
||||
uint32_t SHR3 = (st.jsr ^= (st.jsr << 17), st.jsr ^= (st.jsr >> 13), st.jsr ^= (st.jsr << 5));
|
||||
uint32_t CONG = (st.jcong = 69069 * st.jcong + 1234567);
|
||||
return ((MWC^CONG) + SHR3);
|
||||
st.z = 36969 * (st.z & 65535) + (st.z >> 16);
|
||||
st.w = 18000 * (st.w & 65535) + (st.w >> 16);
|
||||
uint32_t MWC = ((st.z << 16) + st.w);
|
||||
st.jsr ^= (st.jsr << 17);
|
||||
st.jsr ^= (st.jsr >> 13);
|
||||
st.jsr ^= (st.jsr << 5);
|
||||
st.jcong = 69069 * st.jcong + 1234567;
|
||||
return ((MWC^st.jcong) + st.jsr);
|
||||
}
|
||||
```
|
||||
|
||||
@ -121,59 +127,89 @@ void fill_mix(
|
||||
}
|
||||
```
|
||||
|
||||
The main search algorithm uses the Keccak sponge function (a width of 800 bits, with a bitrate of 448, and a capacity of 352) to generate a seed, expands the seed, does a sequence of loads and random math on the mix data, and then compresses the result into a final Keccak permutation (with the same parameters as the first) for target comparison.
|
||||
Like ethash Keccak is used to seed the sequence per-nonce and to produce the final result. The keccak-f800 variant is used as the 32-bit word size matches the native word size of modern GPUs. The implementation is a variant of SHAKE with width=800, bitrate=576, capacity=224, output=256, and no padding. The result of keccak is treated as a 256-bit big-endian number - that is result byte 0 is the MSB of the value.
|
||||
|
||||
```cpp
|
||||
hash32_t keccak_f800_progpow(hash32_t header, uint64_t seed, hash32_t digest)
|
||||
{
|
||||
uint32_t st[25];
|
||||
|
||||
for (int i = 0; i < 25; i++)
|
||||
st[i] = 0;
|
||||
for (int i = 0; i < 8; i++)
|
||||
st[i] = header.uint32s[i];
|
||||
st[8] = seed;
|
||||
st[9] = seed >> 32;
|
||||
for (int i = 0; i < 8; i++)
|
||||
st[10+i] = digest.uint32s[i];
|
||||
|
||||
for (int r = 0; r < 22; r++)
|
||||
keccak_f800_round(st, r);
|
||||
|
||||
hash32_t ret;
|
||||
for (int i=0; i<8; i++)
|
||||
ret.uint32s[i] = st[i];
|
||||
return ret;
|
||||
}
|
||||
```
|
||||
|
||||
The flow of the overall algorithm is:
|
||||
* A keccak hash of the header + nonce to create a seed
|
||||
* Use the seed to generate initial mix data
|
||||
* Loop multiple times, each time hashing random loads and random math into the mix data
|
||||
* Hash all the mix data into a single 256-bit value
|
||||
* A final keccak hash that is compared against the target
|
||||
|
||||
```cpp
|
||||
bool progpow_search(
|
||||
const uint64_t prog_seed,
|
||||
const uint64_t prog_seed, // value is (block_number/PROGPOW_PERIOD)
|
||||
const uint64_t nonce,
|
||||
const hash32_t header,
|
||||
const uint64_t target,
|
||||
const uint64_t *g_dag, // gigabyte DAG located in framebuffer
|
||||
const uint64_t *c_dag // kilobyte DAG located in l1 cache
|
||||
const hash32_t target, // miner can use a uint64_t target, doesn't need the full 256 bit target
|
||||
const uint32_t *dag // gigabyte DAG located in framebuffer - the first portion gets cached
|
||||
)
|
||||
{
|
||||
uint32_t mix[PROGPOW_LANES][PROGPOW_REGS];
|
||||
uint32_t result[4];
|
||||
for (int i = 0; i < 4; i++)
|
||||
result[i] = 0;
|
||||
hash32_t digest;
|
||||
for (int i = 0; i < 8; i++)
|
||||
digest.uint32s[i] = 0;
|
||||
|
||||
// keccak(header..nonce)
|
||||
uint64_t seed = keccak_f800(header, nonce, result);
|
||||
hash32_t seed_256 = keccak_f800_progpow(header, nonce, digest);
|
||||
// endian swap so byte 0 of the hash is the MSB of the value
|
||||
uint64_t seed = bswap(seed_256[0]) << 32 | bswap(seed_256[1]);
|
||||
|
||||
// initialize mix for all lanes
|
||||
for (int l = 0; l < PROGPOW_LANES; l++)
|
||||
fill_mix(seed, l, mix);
|
||||
fill_mix(seed, l, mix[l]);
|
||||
|
||||
// execute the randomly generated inner loop
|
||||
for (int i = 0; i < PROGPOW_CNT_MEM; i++)
|
||||
progPowLoop(prog_seed, i, mix, g_dag, c_dag);
|
||||
for (int i = 0; i < PROGPOW_CNT_DAG; i++)
|
||||
progPowLoop(prog_seed, i, mix, dag);
|
||||
|
||||
// Reduce mix data to a single per-lane result
|
||||
uint32_t lane_hash[PROGPOW_LANES];
|
||||
// Reduce mix data to a per-lane 32-bit digest
|
||||
uint32_t digest_lane[PROGPOW_LANES];
|
||||
for (int l = 0; l < PROGPOW_LANES; l++)
|
||||
{
|
||||
lane_hash[l] = 0x811c9dc5
|
||||
digest_lane[l] = 0x811c9dc5
|
||||
for (int i = 0; i < PROGPOW_REGS; i++)
|
||||
fnv1a(lane_hash[l], mix[l][i]);
|
||||
fnv1a(digest_lane[l], mix[l][i]);
|
||||
}
|
||||
// Reduce all lanes to a single 128-bit result
|
||||
for (int i = 0; i < 4; i++)
|
||||
result[i] = 0x811c9dc5;
|
||||
// Reduce all lanes to a single 256-bit digest
|
||||
for (int i = 0; i < 8; i++)
|
||||
digest.uint32s[i] = 0x811c9dc5;
|
||||
for (int l = 0; l < PROGPOW_LANES; l++)
|
||||
fnv1a(result[l%4], lane_hash[l])
|
||||
fnv1a(digest.uint32s[l%8], digest_lane[l])
|
||||
|
||||
// keccak(header .. keccak(header..nonce) .. result);
|
||||
return (keccak_f800(header, seed, result) <= target);
|
||||
// keccak(header .. keccak(header..nonce) .. digest);
|
||||
return (keccak_f800_progpow(header, seed, digest) <= target);
|
||||
}
|
||||
```
|
||||
|
||||
The inner loop uses FNV and KISS99 to generate a random sequence from the `prog_seed`. This random sequence determines which mix state is accessed and what random math is performed. Since the `prog_seed` changes relatively infrequently it is expected that `progPowLoop` will be compiled while mining instead of interpreted on the fly.
|
||||
|
||||
```cpp
|
||||
|
||||
kiss99_t progPowInit(uint64_t prog_seed, int mix_seq[PROGPOW_REGS])
|
||||
kiss99_t progPowInit(uint64_t prog_seed, int mix_seq_dst[PROGPOW_REGS], int mix_seq_cache[PROGPOW_REGS])
|
||||
{
|
||||
kiss99_t prog_rnd;
|
||||
uint32_t fnv_hash = 0x811c9dc5;
|
||||
@ -181,15 +217,22 @@ kiss99_t progPowInit(uint64_t prog_seed, int mix_seq[PROGPOW_REGS])
|
||||
prog_rnd.w = fnv1a(fnv_hash, prog_seed >> 32);
|
||||
prog_rnd.jsr = fnv1a(fnv_hash, prog_seed);
|
||||
prog_rnd.jcong = fnv1a(fnv_hash, prog_seed >> 32);
|
||||
// Create a random sequence of mix destinations for merge()
|
||||
// guaranteeing every location is touched once
|
||||
// Uses Fisher–Yates shuffle
|
||||
// Create a random sequence of mix destinations for merge() and mix sources for cache reads
|
||||
// guarantees every destination merged once
|
||||
// guarantees no duplicate cache reads, which could be optimized away
|
||||
// Uses Fisher-Yates shuffle
|
||||
for (int i = 0; i < PROGPOW_REGS; i++)
|
||||
mix_seq[i] = i;
|
||||
{
|
||||
mix_seq_dst[i] = i;
|
||||
mix_seq_cache[i] = i;
|
||||
}
|
||||
for (int i = PROGPOW_REGS - 1; i > 0; i--)
|
||||
{
|
||||
int j = kiss99(prog_rnd) % (i + 1);
|
||||
swap(mix_seq[i], mix_seq[j]);
|
||||
int j;
|
||||
j = kiss99(prog_rnd) % (i + 1);
|
||||
swap(mix_seq_dst[i], mix_seq_dst[j]);
|
||||
j = kiss99(prog_rnd) % (i + 1);
|
||||
swap(mix_seq_cache[i], mix_seq_cache[j]);
|
||||
}
|
||||
return prog_rnd;
|
||||
}
|
||||
@ -241,60 +284,66 @@ The main loop:
|
||||
|
||||
```cpp
|
||||
// Helper to get the next value in the per-program random sequence
|
||||
#define rnd() (kiss99(prog_rnd))
|
||||
#define rnd() (kiss99(prog_rnd))
|
||||
// Helper to pick a random mix location
|
||||
#define mix_src() (rnd() % PROGPOW_REGS)
|
||||
#define mix_src() (rnd() % PROGPOW_REGS)
|
||||
// Helper to access the sequence of mix destinations
|
||||
#define mix_dst() (mix_seq[(mix_seq_cnt++)%PROGPOW_REGS])
|
||||
#define mix_dst() (mix_seq_dst[(mix_seq_dst_cnt++)%PROGPOW_REGS])
|
||||
// Helper to access the sequence of cache sources
|
||||
#define mix_cache() (mix_seq_cache[(mix_seq_cache_cnt++)%PROGPOW_REGS])
|
||||
|
||||
void progPowLoop(
|
||||
const uint64_t prog_seed,
|
||||
const uint32_t loop,
|
||||
uint32_t mix[PROGPOW_LANES][PROGPOW_REGS],
|
||||
const uint64_t *g_dag,
|
||||
const uint32_t *c_dag)
|
||||
const uint32_t *dag)
|
||||
{
|
||||
// All lanes share a base address for the global load
|
||||
// Global offset uses mix[0] to guarantee it depends on the load result
|
||||
uint32_t offset_g = mix[loop%PROGPOW_LANES][0] % DAG_SIZE;
|
||||
uint32_t offset_g = mix[loop%PROGPOW_LANES][0] % (DAG_BYTES / (PROGPOW_LANES*PROGPOW_DAG_LOADS*sizeof(uint32_t)));
|
||||
// Lanes can execute in parallel and will be convergent
|
||||
for (int l = 0; l < PROGPOW_LANES; l++)
|
||||
{
|
||||
// global load to sequential locations
|
||||
uint64_t data64 = g_dag[offset_g + l];
|
||||
// global load to the 256 byte DAG entry
|
||||
// every lane can access every part of the entry
|
||||
uint32_t data_g[PROGPOW_DAG_LOADS];
|
||||
uint32_t offset_l = offset_g * PROGPOW_LANES + (l ^ loop) % PROGPOW_LANES;
|
||||
for (int i = 0; i < PROGPOW_DAG_LOADS; i++)
|
||||
data_g[i] = dag[offset_l * PROGPOW_DAG_LOADS + i];
|
||||
|
||||
// initialize the seed and mix destination sequence
|
||||
int mix_seq[PROGPOW_REGS];
|
||||
int mix_seq_cnt = 0;
|
||||
kiss99_t prog_rnd = progPowInit(prog_seed, mix_seq);
|
||||
int mix_seq_dst[PROGPOW_REGS];
|
||||
int mix_seq_cache[PROGPOW_REGS];
|
||||
int mix_seq_dst_cnt = 0;
|
||||
int mix_seq_cache_cnt = 0;
|
||||
kiss99_t prog_rnd = progPowInit(prog_seed, mix_seq_dst, mix_seq_cache);
|
||||
|
||||
uint32_t offset, data32;
|
||||
int max_i = max(PROGPOW_CNT_CACHE, PROGPOW_CNT_MATH);
|
||||
for (int i = 0; i < max_i; i++)
|
||||
{
|
||||
if (i < PROGPOW_CNT_CACHE)
|
||||
{
|
||||
// Cached memory access
|
||||
// lanes access random location
|
||||
offset = mix[l][mix_src()] % PROGPOW_CACHE_WORDS;
|
||||
data32 = c_dag[offset];
|
||||
merge(mix[l][mix_dst()], data32, rnd());
|
||||
// lanes access random 32-bit locations within the first portion of the DAG
|
||||
uint32_t offset = mix[l][mix_cache()] % (PROGPOW_CACHE_BYTES/sizeof(uint32_t));
|
||||
uint32_t data = dag[offset];
|
||||
merge(mix[l][mix_dst()], data, rnd());
|
||||
}
|
||||
if (i < PROGPOW_CNT_MATH)
|
||||
{
|
||||
// Random Math
|
||||
data32 = math(mix[l][mix_src()], mix[l][mix_src()], rnd());
|
||||
merge(mix[l][mix_dst()], data32, rnd());
|
||||
uint32_t data = math(mix[l][mix_src()], mix[l][mix_src()], rnd());
|
||||
merge(mix[l][mix_dst()], data, rnd());
|
||||
}
|
||||
}
|
||||
// Consume the global load data at the very end of the loop
|
||||
// Allows full latency hiding
|
||||
merge(mix[l][0], data64, rnd());
|
||||
merge(mix[l][mix_dst()], data64>>32, rnd());
|
||||
// Consume the global load data at the very end of the loop to allow full latency hiding
|
||||
// Always merge into mix[0] to feed the offset calculation
|
||||
merge(mix[l][0], data_g[0], rnd());
|
||||
for (int i = 1; i < PROGPOW_DAG_LOADS; i++)
|
||||
merge(mix[l][mix_dst()], data_g[i], rnd());
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Rationale
|
||||
|
||||
ProgPoW utilizes almost all parts of a commodity GPU, excluding:
|
||||
@ -308,28 +357,11 @@ Since the GPU is almost fully utilized, there’s little opportunity for specia
|
||||
|
||||
## Backwards Compatibility
|
||||
|
||||
This algorithm is not backwards compatible with the existing Ethash, and will require a fork for adoption. Furthermore, the network hashrate will halve as the time spent in the core is now balanced with time spent in memory.
|
||||
|
||||
## Test Cases
|
||||
|
||||
This PoW algorithm was tested against six different models from two different manufacturers. Selected models span two different chips and memory types from each manufacturer (Polaris20-GDDR5 and Vega10-HBM2 for AMD; GP104-GDDR5 and GP102-GDDR5X for NVIDIA). The average hashrate results are listed below. Additional tests are ongoing.
|
||||
|
||||
As the algorithm nearly fully utilizes GPU functions in a natural way, the results reflect relative GPU performance that is similar to other gaming and graphics applications.
|
||||
|
||||
-------------------------------
|
||||
| Model | Hashrate (MH/s) |
|
||||
| --------- | --------------- |
|
||||
| RX580 | 9.4 |
|
||||
| Vega56 | 16.6 |
|
||||
| Vega64 | 18.7 |
|
||||
| GTX1070Ti | 13.1 |
|
||||
| GTX1080 | 14.9 |
|
||||
| GTX1080Ti | 21.8 |
|
||||
-------------------------------
|
||||
This algorithm is not backwards compatible with the existing Ethash, and will require a fork for adoption. Furthermore, the network hashrate will halve since twice as much memory is loaded per hash.
|
||||
|
||||
## Implementation
|
||||
|
||||
Please refer to the official code located at [ProgPOW](https://github.com/ifdefelse/ProgPOW) for the full code, implemented in the standard ethminer.
|
||||
Please refer to the official code located at [ProgPOW](https://github.com/ifdefelse/ProgPOW) for the full code, implemented in the standard ethminer.
|
||||
|
||||
## Copyright
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user