diff --git a/ECC.md b/ECC.md
index 64651d7..a27f206 100644
--- a/ECC.md
+++ b/ECC.md
@@ -39,4 +39,17 @@ For GF(2^n), MUL is implemented via tables. For x86 SIMD, it uses PSHUFB which i
 
 This means that we need 4x more time for GF(2^32) MUL than for GF(2^16) MUL, making computations in GF(2^32) about 2x slower (and much slower for division since we can't keep table of reciprocals).
 
-GPUs are better to consider as 32-bit processors.
\ No newline at end of file
+GPUs are better to consider as 32-bit processors.
+
+
+### GF(2^64-2^32+1)
+
+Code: https://github.com/pornin/ecgfp5
+
+It's much faster than fields implemented in FastECC - but only on CPUs supporting `64*64=128` multiplication.
+
+For x86 SIMD, GPUs and 32-bit CPUs, this multiplication will be implemented via four `32*32=64` multiplications - exactly like `a*b mod p` for 32-bit values. But since it processes 64 bits of data, it still be 2x faster in terms of GB/s processed.
+
+Probably, on x64 scalar implementation would be faster than anything less than AVX-512, allowing much simpler code.
+
+Moreover, GF(2^64-2^32+1) field is more dense than GF(0xFFF0001) - i.e. we can recode about gigabyte of data into values of this field plus a single bit (compared to only 4 KB for my field)
\ No newline at end of file