status-go/vendor/lukechampine.com/blake3
Prem Chaitanya Prathi 97db14083a
chore_: bump go-waku with filter loop fix (#5909)
* chore_: bump go-waku with filter loop fix

* fix_: correct fleet node for staging fleet

* fix_: use shards for lightclient init

---------

Co-authored-by: Richard Ramos <info@richardramos.me>
2024-10-10 17:03:36 +05:30
..
bao chore_: bump go-waku with filter loop fix (#5909) 2024-10-10 17:03:36 +05:30
guts chore_: bump go-waku with filter loop fix (#5909) 2024-10-10 17:03:36 +05:30
LICENSE feat: wakuv2 store (#2780) 2022-08-19 12:34:07 -04:00
README.md chore_: bump go-waku with filter loop fix (#5909) 2024-10-10 17:03:36 +05:30
blake3.go chore_: bump go-waku with filter loop fix (#5909) 2024-10-10 17:03:36 +05:30

README.md

blake3

GoDoc Go Report Card

go get lukechampine.com/blake3

blake3 implements the BLAKE3 cryptographic hash function. This implementation aims to be performant without sacrificing (too much) readability, in the hopes of eventually landing in x/crypto.

In addition to the pure-Go implementation, this package also contains AVX-512 and AVX2 routines (generated by avo) that greatly increase performance for large inputs and outputs.

Benchmarks

Tested on a 2020 MacBook Air (i5-7600K @ 3.80GHz). Benchmarks will improve as soon as I get access to a beefier AVX-512 machine. 😉

AVX-512

BenchmarkSum256/64           120 ns/op       533.00 MB/s
BenchmarkSum256/1024        2229 ns/op       459.36 MB/s
BenchmarkSum256/65536      16245 ns/op      4034.11 MB/s
BenchmarkWrite               245 ns/op      4177.38 MB/s
BenchmarkXOF                 246 ns/op      4159.30 MB/s

AVX2

BenchmarkSum256/64           120 ns/op       533.00 MB/s
BenchmarkSum256/1024        2229 ns/op       459.36 MB/s
BenchmarkSum256/65536      31137 ns/op      2104.76 MB/s
BenchmarkWrite               487 ns/op      2103.12 MB/s
BenchmarkXOF                 329 ns/op      3111.27 MB/s

Pure Go

BenchmarkSum256/64           120 ns/op       533.00 MB/s
BenchmarkSum256/1024        2229 ns/op       459.36 MB/s
BenchmarkSum256/65536     133505 ns/op       490.89 MB/s
BenchmarkWrite              2022 ns/op       506.36 MB/s
BenchmarkXOF                1914 ns/op       534.98 MB/s

Shortcomings

There is no assembly routine for single-block compressions. This is most noticeable for ~1KB inputs.

Each assembly routine inlines all 7 rounds, causing thousands of lines of duplicated code. Ideally the routines could be merged such that only a single routine is generated for AVX-512 and AVX2, without sacrificing too much performance.