Commit Graph

2 Commits

Author SHA1 Message Date
Jacek Sieka 7cb2e57a58
snappy revamp (#10)
This is a more or less complete revamp of the snappy library aiming to:

* clear out a lot of the duplicate code
* remove some of the redundant API
* unify the codebase behind a single, optimized "inner" encoder/decoder
* unify the public API for in-memory and stream
compression/decompression
* improve performance

As such, only the documented API remains backwards-compatible - the rest
has been refactored, moved around and rewritten:

* `import snappy` now exposes only in-memory encoders / decoders
* framed format moved to `snappy` module, `snappy/framing` removed
* faststreams integration moved to `snappy/faststreams`
* minimal `std/streams` integration started in `snappy/streams`

Other changes include:

* up-to-date documentation
* allocation- and exception-free API (uses some amount of stack memory)
* a 2-3x improvement to both compression and decompression performance,
putting the library mostly on par with the C++ implementation (see
README)
* the implementation was heavily inspired by the `C++`, `C` and `go`
implementations, but somewhat simplified
* nonetheless, the code uses a significant amount of unsafe code to
work around inefficiencies in the safe subset of Nim

With bulk operations in place, the cost of range checks falls
significantly - we can reintroduce them without any significant loss in
performance by carefully ordering operations such that optimizers can
elide most.
2022-04-14 16:22:41 +02:00
Jacek Sieka 45f2d5d84a
bump cppsnappy, split out benchmark
* benchmark both compression and decompression
* bump C++-snappy: it's gotten better over the years

```
fastStreams,       openArrays,       nimStreams,           cppLib,      Samples,         Size,         Test
  0.281 /  0.139,   0.295 /  0.220,   0.528 /  0.238,   0.140 /  0.047,          100,       102400, html
  2.697 /  1.457,   3.263 /  1.993,   6.106 /  2.026,   1.655 /  0.530,          100,       702087, urls.10K
  0.021 /  0.011,   0.017 /  0.011,   0.031 /  0.011,   0.014 /  0.009,          100,       123093, fireworks.jpeg
  0.057 /  0.019,   0.051 /  0.036,   0.092 /  0.042,   0.021 /  0.011,          100,       102400, paper-100k.pdf
  1.153 /  0.565,   1.210 /  0.904,   2.172 /  0.985,   0.575 /  0.184,          100,       409600, html_x_4
  0.887 /  0.582,   0.976 /  0.743,   1.923 /  0.712,   0.536 /  0.202,          100,       152089, alice29.txt
  0.771 /  0.504,   0.850 /  0.639,   1.678 /  0.606,   0.482 /  0.184,          100,       129301, asyoulik.txt
  2.349 /  1.518,   2.553 /  1.946,   5.057 /  1.913,   1.391 /  0.522,          100,       426754, lcet10.txt
  2.983 /  1.951,   3.241 /  2.408,   6.680 /  2.342,   1.882 /  0.735,          100,       481861, plrabn12.txt
  0.293 /  0.120,   0.306 /  0.210,   0.542 /  0.241,   0.131 /  0.042,          100,       118588, geo.protodata
  0.743 /  0.501,   0.738 /  0.620,   1.597 /  0.640,   0.414 /  0.191,          100,       184320, kppkn.gtb
  0.087 /  0.054,   0.102 /  0.067,   0.193 /  0.061,   0.052 /  0.022,          100,        14564, Mark.Twain-Tom.Sawyer.txt
 66.886 / 20.880, 105.393 / 37.907, 210.197 / 38.867,  34.273 / 10.193,           10,     38942424, state-2560000-114a593d-0d5e08e8.ssz
```

In general, we're consistently about 2x slower than C++ right now.
2021-12-30 13:33:27 +01:00