diff --git a/ECC.md b/ECC.md index 64651d7..a27f206 100644 --- a/ECC.md +++ b/ECC.md @@ -39,4 +39,17 @@ For GF(2^n), MUL is implemented via tables. For x86 SIMD, it uses PSHUFB which i This means that we need 4x more time for GF(2^32) MUL than for GF(2^16) MUL, making computations in GF(2^32) about 2x slower (and much slower for division since we can't keep table of reciprocals). -GPUs are better to consider as 32-bit processors. \ No newline at end of file +GPUs are better to consider as 32-bit processors. + + +### GF(2^64-2^32+1) + +Code: https://github.com/pornin/ecgfp5 + +It's much faster than fields implemented in FastECC - but only on CPUs supporting `64*64=128` multiplication. + +For x86 SIMD, GPUs and 32-bit CPUs, this multiplication will be implemented via four `32*32=64` multiplications - exactly like `a*b mod p` for 32-bit values. But since it processes 64 bits of data, it still be 2x faster in terms of GB/s processed. + +Probably, on x64 scalar implementation would be faster than anything less than AVX-512, allowing much simpler code. + +Moreover, GF(2^64-2^32+1) field is more dense than GF(0xFFF0001) - i.e. we can recode about gigabyte of data into values of this field plus a single bit (compared to only 4 KB for my field) \ No newline at end of file