* Workaround for Hardhat timestamp bug Likely due to a Hardhat bug in which the callbacks for subscription events are called and awaited before updating its local understanding of the last block time, Hardhat will report a block time in the `newHeads` event that is generally 1 second before the time reported from `getLatestBlock.timestamp`. This was causing issues with the OnChainClock's offset and therefore the `now()` used by the `OnChainClock` would sometimes be off by a second (or more), causing tests to fail. This commit introduce a `codex_use_hardhat` compilation flag, that when set, will always get the latest block timestamp from Hardhat via the `getLatestBlock.timestamp` RPC call for `OnChainClock.now` calls. Otherwise, the last block timestamp reported in the `newHeads` event will be used. Update the docker dist tests compilation flag for simulated proof failures (it was not correct), and explicitly add the `codex_use_hardhat=false` for clarity. * enable simulated proof failures for coverage * comment out failing test on linux -- will be replaced * bump codex contracts eth * add back clock offset for non-hardhat cases * bump codex-contracts-eth increases pointer by 67 blocks each period increase * Add `codex_use_hardhat` flag to coverage tests
Tips for shorter build times
Runner availability
Currently, the biggest bottleneck when optimizing workflows is the availability of Windows and macOS runners. Therefore, anything that reduces the time spent in Windows or macOS jobs will have a positive impact on the time waiting for runners to become available. The usage limits for Github Actions are described here. You can see a breakdown of runner usage for your jobs in the Github Actions tab (example).
Windows is slow
Performing git operations and compilation are both slow on Windows. This can easily mean that a Windows job takes twice as long as a Linux job. Therefore it makes sense to use a Windows runner only for testing Windows compatibility, and nothing else. Testing compatibility with other versions of Nim, code coverage analysis, etc. are therefore better performed on a Linux runner.
Parallelization
Breaking up a long build job into several jobs that you run in parallel can have a positive impact on the wall clock time that a workflow runs. For instance, you might consider running unit tests and integration tests in parallel. Keep in mind however that availability of macOS and Windows runners is the biggest bottleneck. If you split a Windows job into two jobs, you now need to wait for two Windows runners to become available! Therefore parallelization often only makes sense for Linux jobs.
Refactoring
As with any code, complex workflows are hard to read and change. You can use composite actions and reusable workflows to refactor complex workflows.
Steps for measuring time
Breaking up steps allows you to see the time spent in each part. For instance, instead of having one step where all tests are performed, you might consider having separate steps for e.g. unit tests and integration tests, so that you can see how much time is spent in each.
Fix slow tests
Try to avoid slow unit tests. They not only slow down continuous integration, but also local development. If you encounter slow tests you can consider reworking them to stub out the slow parts that are not under test, or use smaller data structures for the test.
You can use unittest2 together with the environment variable
NIMTEST_TIMING=true
to show how much time is spent in every test
(reference).
Caching
Ensure that caches are updated over time. For instance if you cache the latest version of the Nim compiler, then you want to update the cache when a new version of the compiler is released. See also the documentation for the cache action.
Fail fast
By default a workflow fails fast: if one job fails, the rest are cancelled. This might seem inconvenient, because when you're debugging an issue you often want to know whether you introduced a failure on all platforms, or only on a single one. You might be tempted to disable fail-fast, but keep in mind that this keeps runners busy for longer on a workflow that you know is going to fail anyway. Consequent runs will therefore take longer to start. Fail fast is most likely better for overall development speed.