nim-dagger/.github/workflows
Eric d70ab59004
refactor: multinode integration test refactor (#662)
* refactor multi node test suite

Refactor the multinode test suite into the marketplace test suite.

- Arbitrary number of nodes can be started with each test: clients, providers, validators
- Hardhat can also be started locally with each test, usually for the purpose of saving and inspecting its log file.
- Log files for all nodes can be persisted on disk, with configuration at the test-level
- Log files, if persisted (as specified in the test), will be persisted to a CI artifact
- Node config is specified at the test-level instead of the suite-level
- Node/Hardhat process starting/stopping is now async, and runs much faster
- Per-node config includes:
  - simulating proof failures
  - logging to file
  - log level
  - log topics
  - storage quota
  - debug (print logs to stdout)
- Tests find next available ports when starting nodes, as closing ports on Windows can lag
- Hardhat is no longer required to be running prior to starting the integration tests (as long as Hardhat is configured to run in the tests).
  - If Hardhat is already running, a snapshot will be taken and reverted before and after each test, respectively.
  - If Hardhat is not already running and configured to run at the test-level, a Hardhat process will be spawned and torn down before and after each test, respectively.

* additional logging for debug purposes

* address PR feedback

- fix spelling
- revert change from catching ProviderError to SignerError -- this should be handled more consistently in the Market abstraction, and will be handled in another PR.
- remove method label from raiseAssert
- remove unused import

* Use API instead of command exec to test for free port

Use chronos `createStreamServer` API to test for free port by binding localhost address and port. Use `ServerFlags.ReuseAddr` to enable reuse of same IP/Port on multiple test runs.

* clean up

* remove upraises annotations from tests

* Update tests to work with updated erasure coding slot sizes

* update dataset size, nodes, tolerance to match valid ec params

Integration tests now have valid dataset sizes (blocks), tolerances, and number of nodes, to work with valid ec params. These values are validated when requested storage.

Print the rest api failure message (via doAssert) when a rest api call fails (eg the rest api may validate some ec params).

All integration tests pass when the async `clock.now` changes are reverted.

* dont use async clock for now

* fix workflow

* move integration logs uplod to reusable

---------

Co-authored-by: Dmitriy Ryajov <dryajov@gmail.com>
2024-02-19 04:55:39 +00:00
..
Readme.md Update links to codex-storage organization (#420) 2023-05-23 23:01:13 +03:00
ci-reusable.yml refactor: multinode integration test refactor (#662) 2024-02-19 04:55:39 +00:00
ci.yml wire in circom backend (#698) 2024-02-09 21:40:30 +00:00
docker-dist-tests.yml Workaround for Hardhat last block timestamp bug (#644) 2023-12-20 08:06:24 +11:00
docker-reusable.yml Add Nim-matrix workflow to run on merge queue (#693) 2024-02-06 12:56:27 +02:00
docker.yml Use latest tag for Tags and Default branch only (#540) 2023-08-30 13:09:52 +03:00
docs.yml ci: update actions to lates major versions (#688) 2024-01-29 17:21:52 +02:00
nim-matrix.yml CI update (#695) 2024-02-07 19:47:40 +00:00

Readme.md

Tips for shorter build times

Runner availability

Currently, the biggest bottleneck when optimizing workflows is the availability of Windows and macOS runners. Therefore, anything that reduces the time spent in Windows or macOS jobs will have a positive impact on the time waiting for runners to become available. The usage limits for Github Actions are described here. You can see a breakdown of runner usage for your jobs in the Github Actions tab (example).

Windows is slow

Performing git operations and compilation are both slow on Windows. This can easily mean that a Windows job takes twice as long as a Linux job. Therefore it makes sense to use a Windows runner only for testing Windows compatibility, and nothing else. Testing compatibility with other versions of Nim, code coverage analysis, etc. are therefore better performed on a Linux runner.

Parallelization

Breaking up a long build job into several jobs that you run in parallel can have a positive impact on the wall clock time that a workflow runs. For instance, you might consider running unit tests and integration tests in parallel. Keep in mind however that availability of macOS and Windows runners is the biggest bottleneck. If you split a Windows job into two jobs, you now need to wait for two Windows runners to become available! Therefore parallelization often only makes sense for Linux jobs.

Refactoring

As with any code, complex workflows are hard to read and change. You can use composite actions and reusable workflows to refactor complex workflows.

Steps for measuring time

Breaking up steps allows you to see the time spent in each part. For instance, instead of having one step where all tests are performed, you might consider having separate steps for e.g. unit tests and integration tests, so that you can see how much time is spent in each.

Fix slow tests

Try to avoid slow unit tests. They not only slow down continuous integration, but also local development. If you encounter slow tests you can consider reworking them to stub out the slow parts that are not under test, or use smaller data structures for the test.

You can use unittest2 together with the environment variable NIMTEST_TIMING=true to show how much time is spent in every test (reference).

Caching

Ensure that caches are updated over time. For instance if you cache the latest version of the Nim compiler, then you want to update the cache when a new version of the compiler is released. See also the documentation for the cache action.

Fail fast

By default a workflow fails fast: if one job fails, the rest are cancelled. This might seem inconvenient, because when you're debugging an issue you often want to know whether you introduced a failure on all platforms, or only on a single one. You might be tempted to disable fail-fast, but keep in mind that this keeps runners busy for longer on a workflow that you know is going to fail anyway. Consequent runs will therefore take longer to start. Fail fast is most likely better for overall development speed.