Commit Graph

81 Commits

Author SHA1 Message Date
tersec b77c5406b6
require Nim 2.0 2024-06-24 22:57:09 +00:00
Etan Kissling 913c426d57
bump `snappycpp` to `1.2.1` (#28)
- https://github.com/google/snappy/releases/tag/1.2.1
2024-05-22 07:53:30 +00:00
Etan Kissling aaef74113c
bump `snappycpp` to `1.2.0` (#27)
- https://github.com/google/snappy/releases/tag/1.2.0
2024-04-08 16:09:16 +02:00
tersec 984bdad602
use non-EOL macOS version for GitHub Actions CI (#26) 2024-02-16 18:02:06 +00:00
Zahary Karadjov ef7be6daaf
Add codec.isSnappyFramedStream 2023-09-12 01:49:42 +03:00
Jacek Sieka 754715dfe7
fix indexdefect on invalid framed data (#24)
* fix fuzzer so that it doesn't do exact comparison - this doesn't work
any more because the implementations no longer match byte-for-byte -
instead, we check that the libraries agree on valid/invalid and that C++
can decompress snappy-encoded data
2023-09-10 01:40:05 +03:00
tersec eed56b00ad
run UBSAN in Linux in unit tests (#23) 2023-09-04 14:30:49 +02:00
Jacek Sieka ecbcee1d10
allow skipping crc32 integrity check (#22)
Some data is already protected by stronger checks - crc32 on the other
hand significantly slows down framed reading - ie 2.5x slower:

```
118.853 / 41.781, 129.115 /  0.000, 188.438 /  0.000,  90.565 / 44.371,           50,    115613038, state-6800000-488b7150-d613b584.ssz
186.600 / 97.202, 191.935 /123.325,   0.000 /  0.000,   0.000 /  0.000,           50,    115613038, state-6800000-488b7150-d613b584.ssz(framed)
```

The difference between unframed and framed decoding is the CRC32 check -
it takes ~50ms on a decent laptop for a 110mb file.
2023-07-25 18:50:36 +03:00
Jacek Sieka e36f19d886
clean up Defect (#21) 2023-07-21 14:44:16 +02:00
Jacek Sieka 6d51d125b1
bump upstream (#20) 2023-07-11 09:39:56 +02:00
tersec 6da3e98f54
remove travis and appveyor ci setups (#19) 2023-06-16 08:39:33 +00:00
tersec 139377742f
Remove Nim 1.2 and 1.4 support (#18) 2023-06-08 18:29:39 +00:00
tersec 470868b244
test both refc and ORC in post-1.6 Nim versions (#16) 2023-04-14 01:47:16 +00:00
tersec e49f2f6ee9
use Nim 2.0 in CI (#15)
* use Nim 2.0 in CI

* use non-deprecated Ubuntu 20.04 image
2023-04-07 14:27:53 +00:00
Jacek Sieka 235c33c5ef
normalise nimble, update ci, fix warnings (#14)
* normalise nimble, update ci, fix warnings

* bump snappycpp

* lower macos builder version

newer clang is more restrictive than snappycpp supports
2022-11-24 12:22:15 +01:00
Jacek Sieka a4b78690ef
add test for `uncompressedLenFramed` (#12) 2022-11-23 19:33:36 +01:00
Jacek Sieka 7cb2e57a58
snappy revamp (#10)
This is a more or less complete revamp of the snappy library aiming to:

* clear out a lot of the duplicate code
* remove some of the redundant API
* unify the codebase behind a single, optimized "inner" encoder/decoder
* unify the public API for in-memory and stream
compression/decompression
* improve performance

As such, only the documented API remains backwards-compatible - the rest
has been refactored, moved around and rewritten:

* `import snappy` now exposes only in-memory encoders / decoders
* framed format moved to `snappy` module, `snappy/framing` removed
* faststreams integration moved to `snappy/faststreams`
* minimal `std/streams` integration started in `snappy/streams`

Other changes include:

* up-to-date documentation
* allocation- and exception-free API (uses some amount of stack memory)
* a 2-3x improvement to both compression and decompression performance,
putting the library mostly on par with the C++ implementation (see
README)
* the implementation was heavily inspired by the `C++`, `C` and `go`
implementations, but somewhat simplified
* nonetheless, the code uses a significant amount of unsafe code to
work around inefficiencies in the safe subset of Nim

With bulk operations in place, the cost of range checks falls
significantly - we can reintroduce them without any significant loss in
performance by carefully ordering operations such that optimizers can
elide most.
2022-04-14 16:22:41 +02:00
Jacek Sieka 16bf7b7d96
deduplicate and reorganise code (#9)
The snappy codebase is a mess with competing implementations,
nonsensical code duplication and no real direction due to a partially
implemented faststreams migration.

This PR makes it slightly less of a mess, but make no mistake, it's
still a mess - the difference being that there are a few more signposts
along the way in terms of module organisation, and a little less mess as
the line count of the PR discloses.

Performance remains poor - ~3x slower than C++ - but at least there's
less code to look at :)
2022-04-01 12:57:39 +02:00
Jacek Sieka 6537b10600
Merge pull request #8 from status-im/raises-framed
Raises framed
2022-03-29 09:09:31 +02:00
Jacek Sieka 6afa8377e1
raises annotations for framed format 2022-03-28 14:08:06 +02:00
Ștefan Talpalaru 3d39a6228a
CI: test with multiple Nim versions (#7)
* CI: test with multiple Nim versions
2022-01-11 20:25:13 +01:00
Jacek Sieka 1b3f8d60a8
Merge pull request #24 from status-im/snappy-refresh
bump cppsnappy, split out benchmark
2022-01-02 19:05:54 +01:00
Jacek Sieka 2256d6efb2
ensure benchmarks are built on test 2021-12-30 13:54:55 +01:00
Jacek Sieka 45f2d5d84a
bump cppsnappy, split out benchmark
* benchmark both compression and decompression
* bump C++-snappy: it's gotten better over the years

```
fastStreams,       openArrays,       nimStreams,           cppLib,      Samples,         Size,         Test
  0.281 /  0.139,   0.295 /  0.220,   0.528 /  0.238,   0.140 /  0.047,          100,       102400, html
  2.697 /  1.457,   3.263 /  1.993,   6.106 /  2.026,   1.655 /  0.530,          100,       702087, urls.10K
  0.021 /  0.011,   0.017 /  0.011,   0.031 /  0.011,   0.014 /  0.009,          100,       123093, fireworks.jpeg
  0.057 /  0.019,   0.051 /  0.036,   0.092 /  0.042,   0.021 /  0.011,          100,       102400, paper-100k.pdf
  1.153 /  0.565,   1.210 /  0.904,   2.172 /  0.985,   0.575 /  0.184,          100,       409600, html_x_4
  0.887 /  0.582,   0.976 /  0.743,   1.923 /  0.712,   0.536 /  0.202,          100,       152089, alice29.txt
  0.771 /  0.504,   0.850 /  0.639,   1.678 /  0.606,   0.482 /  0.184,          100,       129301, asyoulik.txt
  2.349 /  1.518,   2.553 /  1.946,   5.057 /  1.913,   1.391 /  0.522,          100,       426754, lcet10.txt
  2.983 /  1.951,   3.241 /  2.408,   6.680 /  2.342,   1.882 /  0.735,          100,       481861, plrabn12.txt
  0.293 /  0.120,   0.306 /  0.210,   0.542 /  0.241,   0.131 /  0.042,          100,       118588, geo.protodata
  0.743 /  0.501,   0.738 /  0.620,   1.597 /  0.640,   0.414 /  0.191,          100,       184320, kppkn.gtb
  0.087 /  0.054,   0.102 /  0.067,   0.193 /  0.061,   0.052 /  0.022,          100,        14564, Mark.Twain-Tom.Sawyer.txt
 66.886 / 20.880, 105.393 / 37.907, 210.197 / 38.867,  34.273 / 10.193,           10,     38942424, state-2560000-114a593d-0d5e08e8.ssz
```

In general, we're consistently about 2x slower than C++ right now.
2021-12-30 13:33:27 +01:00
jangko 7f51d29126
windows-ci: bump mingw gcc from v8.1.0 to v11.2.0 2021-12-14 23:31:07 +07:00
Etan Kissling 16cce7d07c fix `appendSnappyBytes` index computation
The `appendSnappyBytes` implementation of `snappy` computes indices
incorrectly, resulting in wrong data being produced. The implementation
was fixed and the test suite extended accordingly. Note that this issue
is not reachable because `appendSnappyBytes` is only used in test code.
2021-12-14 23:19:24 +07:00
Etan Kissling d555230013 avoid unnecessary compression of short payloads
When using `framingFormatCompress`, the given payload is compressed to
determine whether its compressed form is shorter than its raw form.
For short payloads the Snappy compression will never be shorter, so it
is not necessary to compress such payloads. Instead, short payloads can
always be treated as uncompressable data. This patch optimizes for that.
2021-12-14 23:18:34 +07:00
Ștefan Talpalaru 5750797ded
CI: refactor Nim compiler caching 2021-06-01 04:05:29 +02:00
Jacek Sieka be86aed2ad
Merge pull request #19 from status-im/fix_fishy_code
fixes fishy and dubious codes
2021-02-17 13:30:38 +01:00
jangko 22dbb2eb65
fixes fishy and dubious codes 2021-02-03 19:17:19 +07:00
andri lim 4c50008ab2
Merge pull request #17 from status-im/github_action
add github action
2020-12-22 16:00:33 +07:00
jangko ab5bbf624a
add github action script 2020-12-22 15:27:06 +07:00
andri lim b9b3d4931c
Merge pull request #16 from status-im/snappycpp_submodule
submoduling snappycpp
2020-12-22 12:53:04 +07:00
jangko 6861fc1aae
fixes ci script 2020-12-22 12:34:07 +07:00
jangko b10f16da7a
submoduling snappycpp 2020-12-22 12:17:39 +07:00
Jacek Sieka 5a8166b786 use stew/leb128 2020-12-15 17:07:58 +02:00
Jacek Sieka 8455b825e5
Merge pull request #13 from status-im/silly-operator
fix unnecessary seq allocation
2020-08-26 18:46:47 +02:00
Jacek Sieka 07cea69de5
fix unnecessary seq allocation
this significantly slows down the implementation
2020-08-26 15:52:15 +02:00
Zahary Karadjov 1e506c80a9
Remove some unused code 2020-08-19 14:19:51 +03:00
Zahary Karadjov 6da2be2564
Fix a logical typo 2020-08-19 13:26:18 +03:00
Zahary Karadjov a368549c1a
Fix various integer overflow issues found through fuzzing 2020-08-18 23:11:42 +03:00
Zahary Karadjov f449a5a47a
Allow bounding the maximum decoded size 2020-08-18 17:34:55 +03:00
Zahary Karadjov cae6c07fbb
Fix the fuzzing test 2020-07-22 20:25:22 +03:00
andri lim 676fa656d3
Merge pull request #10 from status-im/fix_crash
fix crc32c nim side crash
2020-05-24 09:36:00 +07:00
jangko 700f7777fd
fix crc32c nim side crash 2020-05-23 13:21:55 +07:00
andri lim 672ecf54d9
Merge pull request #8 from status-im/fuzzing-tests
Added fuzzing tests
2020-05-21 12:03:10 +07:00
jangko 9e1856da95
fix appveyor.yml: ignore warning when building snappycpp instead of treating it as errors 2020-05-21 11:50:08 +07:00
Zahary Karadjov f6a87764a3
Added fuzzing tests 2020-05-20 21:06:13 +03:00
Zahary Karadjov b4cd68e27a
Add a helper for framing compression of blobs 2020-05-12 22:30:21 +03:00
Zahary Karadjov 20cc8ce1c2
Export more constants, so they can be used in NBC 2020-05-08 22:24:16 +03:00