Initial ECC page

Bulat-Ziganshin 2022-07-05 17:34:06 +03:00
parent a1a3b0bc0b
commit 850d557b2e
1 changed files with 42 additions and 0 deletions

42
ECC.md Normal file

@ -0,0 +1,42 @@
### ECC libraries
- FastECC can use GF(p) and GF(p^2), p may be 64-bit
- Leopard uses GF(2^n), supports N<=65536 and M>=K
- AFAIR, FastECC is 2-3 times faster than Leopard for same M,K setting. But it may be correct only for large N. Also, FastECC starts to beat fastest N^2 RS codecs around N=100.
FastECC uses real FFT, so it can be adapted to any field having roots of unity. But since fields are non-binary, it requires recoding either input or output data and storing those extra bits somewhere. F.e. recommended field is GF(0xFFF00001) and input 4KB data are recoded into 1024 values of this field plus one bit.
Leopard uses so-called "additive FFT", which isn't FFT, but another, pretty similar transformation. The reason is no unity roots in GF(2^n), making FFT impossible.
### Computations in GF
In GF(p^2), ADD/SUB/MUL are implemented via operations on two GF(p) values by rules similar to Complex field.
In GF(p), computations done modulo p:
- ADD/SUB are implemented by a single '?:' operation
- MUL requires full-scale `mod p` computations, and for a constant p it requires 4 multiplications and a few add/sub operations
In GF(2^n):
- ADD/SUB is just XOR
- MUL is usually fetched from multiplication tables. When tables become too large, numbers are split into multiple parts and partial result combined using property `(a+b)*c = ab+ac`
DIV is the most complex operation for any GF. Fortunately, iFFT requires only one DIV operation per input value (i.e. O(n) divisions total compared to `O(n*log(n))` multiplications), and FFT doesn't need them at all.
DIV for any GF can be implemented via:
- DivTable[a,b]
- ReciprocalTable[a] storing 1/a values, with following multiplication by b
- By computing 1/a: For any GF(p^n) and any x, `x^(p^n-1) == 1`, so `1/x = x^(p^n-2)`. This requires `O(log(p^n))` MUL operations over field
### CPU arithmetics
Modern CPUs are mostly 64-bit (although some are 32-bit). This means that they perform all computations on 64-bit values, thus computing on smaller values just wastes time, while MUL/DIV on larger values is slower (in terms of bits processed per second).
With SIMD, they essentially become 8/16/32/64-bit processors for ADD/SUB/XOR. On x86, MUL is best implemented for 32-bit values, and DIV isn't implemented at all.
For GF(2^n), MUL is implemented via tables. For x86 SIMD, it uses PSHUFB which is essentially lookup with 4-bit index. Number of 4-bit multiplications required to implement GF(2^n) MUL, is an order of (n/4)^2 (both input values are split into 4-bit segments and each input-1 segment is multiplied to each input-2 segment).
This means that we need 4x more time for GF(2^32) MUL than for GF(2^16) MUL, making computations in GF(2^32) about 2x slower (and much slower for division since we can't keep table of reciprocals).
GPUs are better to consider as 32-bit processors.