Commit Graph

30 Commits

Author SHA1 Message Date
Etan Kissling 913c426d57
bump `snappycpp` to `1.2.1` (#28)
- https://github.com/google/snappy/releases/tag/1.2.1
2024-05-22 07:53:30 +00:00
Etan Kissling aaef74113c
bump `snappycpp` to `1.2.0` (#27)
- https://github.com/google/snappy/releases/tag/1.2.0
2024-04-08 16:09:16 +02:00
Jacek Sieka 754715dfe7
fix indexdefect on invalid framed data (#24)
* fix fuzzer so that it doesn't do exact comparison - this doesn't work
any more because the implementations no longer match byte-for-byte -
instead, we check that the libraries agree on valid/invalid and that C++
can decompress snappy-encoded data
2023-09-10 01:40:05 +03:00
Jacek Sieka ecbcee1d10
allow skipping crc32 integrity check (#22)
Some data is already protected by stronger checks - crc32 on the other
hand significantly slows down framed reading - ie 2.5x slower:

```
118.853 / 41.781, 129.115 /  0.000, 188.438 /  0.000,  90.565 / 44.371,           50,    115613038, state-6800000-488b7150-d613b584.ssz
186.600 / 97.202, 191.935 /123.325,   0.000 /  0.000,   0.000 /  0.000,           50,    115613038, state-6800000-488b7150-d613b584.ssz(framed)
```

The difference between unframed and framed decoding is the CRC32 check -
it takes ~50ms on a decent laptop for a 110mb file.
2023-07-25 18:50:36 +03:00
Jacek Sieka e36f19d886
clean up Defect (#21) 2023-07-21 14:44:16 +02:00
Jacek Sieka 6d51d125b1
bump upstream (#20) 2023-07-11 09:39:56 +02:00
Jacek Sieka 235c33c5ef
normalise nimble, update ci, fix warnings (#14)
* normalise nimble, update ci, fix warnings

* bump snappycpp

* lower macos builder version

newer clang is more restrictive than snappycpp supports
2022-11-24 12:22:15 +01:00
Jacek Sieka a4b78690ef
add test for `uncompressedLenFramed` (#12) 2022-11-23 19:33:36 +01:00
Jacek Sieka 7cb2e57a58
snappy revamp (#10)
This is a more or less complete revamp of the snappy library aiming to:

* clear out a lot of the duplicate code
* remove some of the redundant API
* unify the codebase behind a single, optimized "inner" encoder/decoder
* unify the public API for in-memory and stream
compression/decompression
* improve performance

As such, only the documented API remains backwards-compatible - the rest
has been refactored, moved around and rewritten:

* `import snappy` now exposes only in-memory encoders / decoders
* framed format moved to `snappy` module, `snappy/framing` removed
* faststreams integration moved to `snappy/faststreams`
* minimal `std/streams` integration started in `snappy/streams`

Other changes include:

* up-to-date documentation
* allocation- and exception-free API (uses some amount of stack memory)
* a 2-3x improvement to both compression and decompression performance,
putting the library mostly on par with the C++ implementation (see
README)
* the implementation was heavily inspired by the `C++`, `C` and `go`
implementations, but somewhat simplified
* nonetheless, the code uses a significant amount of unsafe code to
work around inefficiencies in the safe subset of Nim

With bulk operations in place, the cost of range checks falls
significantly - we can reintroduce them without any significant loss in
performance by carefully ordering operations such that optimizers can
elide most.
2022-04-14 16:22:41 +02:00
Jacek Sieka 16bf7b7d96
deduplicate and reorganise code (#9)
The snappy codebase is a mess with competing implementations,
nonsensical code duplication and no real direction due to a partially
implemented faststreams migration.

This PR makes it slightly less of a mess, but make no mistake, it's
still a mess - the difference being that there are a few more signposts
along the way in terms of module organisation, and a little less mess as
the line count of the PR discloses.

Performance remains poor - ~3x slower than C++ - but at least there's
less code to look at :)
2022-04-01 12:57:39 +02:00
Jacek Sieka 45f2d5d84a
bump cppsnappy, split out benchmark
* benchmark both compression and decompression
* bump C++-snappy: it's gotten better over the years

```
fastStreams,       openArrays,       nimStreams,           cppLib,      Samples,         Size,         Test
  0.281 /  0.139,   0.295 /  0.220,   0.528 /  0.238,   0.140 /  0.047,          100,       102400, html
  2.697 /  1.457,   3.263 /  1.993,   6.106 /  2.026,   1.655 /  0.530,          100,       702087, urls.10K
  0.021 /  0.011,   0.017 /  0.011,   0.031 /  0.011,   0.014 /  0.009,          100,       123093, fireworks.jpeg
  0.057 /  0.019,   0.051 /  0.036,   0.092 /  0.042,   0.021 /  0.011,          100,       102400, paper-100k.pdf
  1.153 /  0.565,   1.210 /  0.904,   2.172 /  0.985,   0.575 /  0.184,          100,       409600, html_x_4
  0.887 /  0.582,   0.976 /  0.743,   1.923 /  0.712,   0.536 /  0.202,          100,       152089, alice29.txt
  0.771 /  0.504,   0.850 /  0.639,   1.678 /  0.606,   0.482 /  0.184,          100,       129301, asyoulik.txt
  2.349 /  1.518,   2.553 /  1.946,   5.057 /  1.913,   1.391 /  0.522,          100,       426754, lcet10.txt
  2.983 /  1.951,   3.241 /  2.408,   6.680 /  2.342,   1.882 /  0.735,          100,       481861, plrabn12.txt
  0.293 /  0.120,   0.306 /  0.210,   0.542 /  0.241,   0.131 /  0.042,          100,       118588, geo.protodata
  0.743 /  0.501,   0.738 /  0.620,   1.597 /  0.640,   0.414 /  0.191,          100,       184320, kppkn.gtb
  0.087 /  0.054,   0.102 /  0.067,   0.193 /  0.061,   0.052 /  0.022,          100,        14564, Mark.Twain-Tom.Sawyer.txt
 66.886 / 20.880, 105.393 / 37.907, 210.197 / 38.867,  34.273 / 10.193,           10,     38942424, state-2560000-114a593d-0d5e08e8.ssz
```

In general, we're consistently about 2x slower than C++ right now.
2021-12-30 13:33:27 +01:00
Etan Kissling 16cce7d07c fix `appendSnappyBytes` index computation
The `appendSnappyBytes` implementation of `snappy` computes indices
incorrectly, resulting in wrong data being produced. The implementation
was fixed and the test suite extended accordingly. Note that this issue
is not reachable because `appendSnappyBytes` is only used in test code.
2021-12-14 23:19:24 +07:00
jangko 22dbb2eb65
fixes fishy and dubious codes 2021-02-03 19:17:19 +07:00
jangko b10f16da7a
submoduling snappycpp 2020-12-22 12:17:39 +07:00
Zahary Karadjov a368549c1a
Fix various integer overflow issues found through fuzzing 2020-08-18 23:11:42 +03:00
Zahary Karadjov cae6c07fbb
Fix the fuzzing test 2020-07-22 20:25:22 +03:00
jangko 700f7777fd
fix crc32c nim side crash 2020-05-23 13:21:55 +07:00
Zahary Karadjov f6a87764a3
Added fuzzing tests 2020-05-20 21:06:13 +03:00
Zahary Karadjov 5e9e2a1f65
Async version of the Snappy framing format based on the latest FastStreams version 2020-05-06 00:35:55 +03:00
Zahary Karadjov 80cff583e3
More faststrams upgrades; Re-enable the file-based tests 2020-04-14 17:00:40 +03:00
Zahary Karadjov b5196c17b6
Use the latest faststreams OutputStream API 2020-04-13 15:02:41 +03:00
andri lim f08cbf9dc5 working snappy framing compress prototype 2020-04-01 22:35:57 +03:00
andri lim 45b8258af4 renormalize *.txt EOL 2020-04-01 22:35:57 +03:00
andri lim 73bb7db070 working framing uncompress prototype 2020-04-01 22:35:57 +03:00
andri lim 71b24a6d15 add framing format test runner 2020-04-01 22:35:57 +03:00
andri lim 4378d9fc93 make test green 2019-09-03 12:32:26 +03:00
Zahary Karadjov 072c5eee43
Add an imlementation based on Nim std streams 2019-07-08 17:00:08 +03:00
Zahary Karadjov 185a0bb769
Migrate to faststreams; WIP benchmark 2019-07-07 15:33:25 +03:00
andri lim c70156b165 fixes tests 2018-11-02 21:48:41 +07:00
andri lim 6fcbbfbab2 initial commit 2018-11-02 12:10:58 +07:00