E M
db73173ebd
re-enable release test workflow
2026-05-28 16:17:30 +10:00
E M
63a16e249a
Make summary more readable
2026-05-28 15:41:30 +10:00
E M
2a514e379c
update workflow run summary
...
- add retention date
- update titles and links for readability
2026-05-28 15:41:29 +10:00
E M
a021908788
Read test result ConfigMap instead of trying to scrape logs for test result info
2026-05-28 15:41:29 +10:00
E M
640d7f0943
more accurate name for step
2026-05-28 15:41:29 +10:00
E M
01677bf9cb
move test summary generation above check job status
...
if check job status was a failure (eg when a test failed), then the test summary generation was being skipped. Moving the test summary generation step above the job status check avoids this.
2026-05-28 15:41:29 +10:00
E M
41be805412
Inc runner node disk space
...
Attempt to avoid insufficient disk space errors in tests
2026-05-28 15:41:29 +10:00
E M
1ad70fec22
don't wait for pvc disks to be deleted, delete all at end in case runner crashes
2026-05-28 15:41:29 +10:00
E M
d0b794ff4f
increase memory of the runner pod
...
was seeing exit code 137 (OOM)
2026-05-28 15:41:29 +10:00
E M
8e9d39197f
Keep 1 node alive so autoscaler doesn't scale to 0
...
Should help speed up startup, avoiding errors like "pod couldn't be scheduled"
2026-05-28 15:41:29 +10:00
E M
2d7aca1054
wait for pvcs to be deleted before destroying the cluster
2026-05-28 15:41:29 +10:00
E M
17a1c556cc
use on demand VMs instead of spot instances
...
Attempting to fix a lot of errors in the console relating to spot instances being unschedulable.
2026-05-28 15:41:29 +10:00
E M
f84fd7f25c
do not generate test summary if previous steps were skipped/cancelled
2026-05-28 15:41:29 +10:00
E M
5203cf93e4
fix error in "print storage logs url" step
2026-05-28 15:41:29 +10:00
E M
b4180c471b
delete terraform state lock
...
When the workflow is cancelled, either manually, or automatically from a long-running step (timeout), the terraform state lock had to be manually deleted, or else the next workflow run would never succeed. This change ensures that the state lock file is always deleted after each run.
2026-05-28 15:41:28 +10:00
E M
0e46c9f684
generate test summary to show in workflow summary
2026-05-28 15:41:28 +10:00
E M
37f14a6821
Move from a single zone to multiple zones to increase spot instance availability
2026-05-28 15:41:28 +10:00
E M
8fccef9fb2
Reduce nodes in pool from 10 to 5
...
Reduces resource contention. 2 parallel tests x 10 containers => 2-3 nodes needed, 5 gives room
2026-05-28 15:41:28 +10:00
E M
c1855fb13a
put cluster name in an env var
2026-05-28 15:41:28 +10:00
E M
10ca94261b
avoid sleeping a full 60s to wait for job completion
...
Instead, wait for a job condition using kubectl wait
2026-05-28 15:41:28 +10:00
E M
3679040178
try to ensure the log stream survives long silences
2026-05-28 15:41:28 +10:00
E M
e58c8f93c7
add starttime param to logging URL
2026-05-28 15:41:28 +10:00
E M
f72dbb9c9d
cap boot drive size to 20gb (default is 100gb) to avoid resource exhaustion
2026-05-28 15:41:28 +10:00
E M
b04672ebce
Add a "Delete PVCs before cluster teardown" step to the workflow to prevent future PVC leaks
2026-05-28 15:41:28 +10:00
E M
eac099b819
try zone a one more time
2026-05-28 15:41:28 +10:00
E M
2c627c9ed2
hanging at 64% deploying again, trying zone c
2026-05-28 15:41:28 +10:00
E M
c520e79383
fix encoding of logging url
2026-05-28 15:41:27 +10:00
E M
be582eca17
move back to europe-west4-b zone due to exhausted quota
2026-05-28 15:41:27 +10:00
E M
82630eead6
refactor: remove allow-tests-pods node label from GKE node pools
...
The `allow-tests-pods` boolean label was used by the test framework to steer pods away from runner nodes via a node affinity exclusion. Pod scheduling now uses the existing `workload-type` label directly as a nodeSelector, making the boolean label redundant.
2026-05-28 15:41:27 +10:00
E M
68c319863a
Logging URL filters by RUNID instead of namespace/container name
2026-05-28 15:41:27 +10:00
E M
4f86040c2c
fix: avoid building in parallel
...
Avoids "file in use" errors while building in CI
2026-05-28 15:41:27 +10:00
E M
db5eada055
remove unneeded priority request
2026-05-28 15:41:27 +10:00
E M
fd5c29db31
set cluster creation timeout to 20mins
...
temporary timeout so we can see if the latest commits work without waiting too long between tries
2026-05-28 15:41:27 +10:00
E M
97750a47ca
Try changing zones in case the cluster deployment stall is due to a zonal unavailability.
2026-05-28 15:41:27 +10:00
E M
5616b50bfb
change monitoring to default service
...
Cluster deployment seems to be stalling because the metrics service is not started. So returning it to default to see if that fixes the issue.
2026-05-28 15:41:27 +10:00
E M
7d6701d444
inline node pools so they can be created in parallel
...
speeds up cluster creation
2026-05-28 15:41:27 +10:00
E M
bbc4b1caf3
remove unneeded setup
2026-05-28 15:41:27 +10:00
E M
898010d58f
move state bucket from gh secret to variable
2026-05-28 15:41:27 +10:00
E M
77e8d6d64a
create the terraform cache dir first
2026-05-28 15:41:26 +10:00
E M
0e298bddbd
add debug output
2026-05-28 15:41:26 +10:00
E M
48b444d8fe
change script so it doesn't non-zero exit when no pods exist
2026-05-28 15:41:26 +10:00
E M
7a9b93a981
fix terraform cache, should remove warning
2026-05-28 15:41:26 +10:00
E M
d4d52c008a
fix polling script
2026-05-28 15:41:26 +10:00
E M
3ed677c9d1
check pod phase instead
2026-05-28 15:41:26 +10:00
E M
cd972ef9bb
refactor polling loop
2026-05-28 15:41:26 +10:00
E M
1696aa83a9
temp comment out releasee workflow
2026-05-28 15:41:26 +10:00
E M
a901e1495c
temp comment out build workflow
2026-05-28 15:41:26 +10:00
E M
7f782cf6a1
temp comment out build to make testing ci changes faster
2026-05-28 15:41:26 +10:00
E M
dabdc6d3e9
Keeps timing out waiting for start, so try polling loop
2026-05-28 15:41:26 +10:00
E M
3cb3a176b2
wait for runners-ci node to be ready before continuing workflow
2026-05-28 15:41:26 +10:00