E M
4962e3bd41
more accurate name for step
2026-05-01 17:47:17 +10:00
E M
fc3c559e4f
move test summary generation above check job status
...
if check job status was a failure (eg when a test failed), then the test summary generation was being skipped. Moving the test summary generation step above the job status check avoids this.
2026-05-01 16:08:59 +10:00
E M
2a94718b37
Inc runner node disk space
...
Attempt to avoid insufficient disk space errors in tests
2026-05-01 12:47:29 +10:00
E M
805ae86268
don't wait for pvc disks to be deleted, delete all at end in case runner crashes
2026-05-01 12:45:19 +10:00
E M
c0b1af52ca
increase memory of the runner pod
...
was seeing exit code 137 (OOM)
2026-05-01 11:37:11 +10:00
E M
a9cb218d59
Keep 1 node alive so autoscaler doesn't scale to 0
...
Should help speed up startup, avoiding errors like "pod couldn't be scheduled"
2026-05-01 11:36:53 +10:00
E M
358c03c05e
wait for pvcs to be deleted before destroying the cluster
2026-05-01 11:08:00 +10:00
E M
c470b3e102
use on demand VMs instead of spot instances
...
Attempting to fix a lot of errors in the console relating to spot instances being unschedulable.
2026-04-30 18:35:46 +10:00
E M
f140476d65
do not generate test summary if previous steps were skipped/cancelled
2026-04-30 18:34:55 +10:00
E M
0d74299a57
fix error in "print storage logs url" step
2026-04-30 18:34:26 +10:00
E M
185aa06514
delete terraform state lock
...
When the workflow is cancelled, either manually, or automatically from a long-running step (timeout), the terraform state lock had to be manually deleted, or else the next workflow run would never succeed. This change ensures that the state lock file is always deleted after each run.
2026-04-30 18:15:57 +10:00
E M
94afc921e0
generate test summary to show in workflow summary
2026-04-30 18:09:35 +10:00
E M
2872a2800f
Move from a single zone to multiple zones to increase spot instance availability
2026-04-30 17:59:36 +10:00
E M
c889c0283f
Reduce nodes in pool from 10 to 5
...
Reduces resource contention. 2 parallel tests x 10 containers => 2-3 nodes needed, 5 gives room
2026-04-30 17:58:56 +10:00
E M
54075c576c
put cluster name in an env var
2026-04-30 15:32:40 +10:00
E M
4dc2fb9a79
avoid sleeping a full 60s to wait for job completion
...
Instead, wait for a job condition using kubectl wait
2026-04-30 15:23:01 +10:00
E M
abaad73465
try to ensure the log stream survives long silences
2026-04-30 12:57:05 +10:00
E M
b78c2d5301
add starttime param to logging URL
2026-04-29 21:26:57 +10:00
E M
5da74edda0
cap boot drive size to 20gb (default is 100gb) to avoid resource exhaustion
2026-04-29 21:26:47 +10:00
E M
9e732b16b9
Add a "Delete PVCs before cluster teardown" step to the workflow to prevent future PVC leaks
2026-04-29 20:37:18 +10:00
E M
c177225677
try zone a one more time
2026-04-29 18:35:08 +10:00
E M
0965220c6d
hanging at 64% deploying again, trying zone c
2026-04-29 17:48:54 +10:00
E M
ebf0abb35c
fix encoding of logging url
2026-04-29 17:13:57 +10:00
E M
a4750824bc
move back to europe-west4-b zone due to exhausted quota
2026-04-29 17:13:44 +10:00
E M
4fd5bdca92
refactor: remove allow-tests-pods node label from GKE node pools
...
The `allow-tests-pods` boolean label was used by the test framework to steer pods away from runner nodes via a node affinity exclusion. Pod scheduling now uses the existing `workload-type` label directly as a nodeSelector, making the boolean label redundant.
2026-04-29 16:46:24 +10:00
E M
b7b01f92e8
Logging URL filters by RUNID instead of namespace/container name
2026-04-29 15:20:44 +10:00
E M
f5150109f7
fix: avoid building in parallel
...
Avoids "file in use" errors while building in CI
2026-04-28 20:02:00 +10:00
E M
970012aa04
remove unneeded priority request
2026-04-28 18:07:43 +10:00
E M
6c86e8a9ed
set cluster creation timeout to 20mins
...
temporary timeout so we can see if the latest commits work without waiting too long between tries
2026-04-28 17:57:15 +10:00
E M
11cb97e97d
Try changing zones in case the cluster deployment stall is due to a zonal unavailability.
2026-04-28 17:51:01 +10:00
E M
6bc28b68d7
change monitoring to default service
...
Cluster deployment seems to be stalling because the metrics service is not started. So returning it to default to see if that fixes the issue.
2026-04-28 17:50:43 +10:00
E M
f2b26ae5eb
inline node pools so they can be created in parallel
...
speeds up cluster creation
2026-04-28 16:46:22 +10:00
E M
70ae988c9b
remove unneeded setup
2026-04-28 16:28:04 +10:00
E M
9f46e1ce8a
move state bucket from gh secret to variable
2026-04-24 17:20:03 +10:00
E M
93fc629706
create the terraform cache dir first
2026-04-24 17:11:54 +10:00
E M
0839bd0301
add debug output
2026-04-24 17:01:52 +10:00
E M
580a424086
change script so it doesn't non-zero exit when no pods exist
2026-04-24 15:45:47 +10:00
E M
ffde5e0fdc
fix terraform cache, should remove warning
2026-04-24 15:33:32 +10:00
E M
aebe3a4262
fix polling script
2026-04-24 15:30:56 +10:00
E M
073dc7c408
check pod phase instead
2026-04-24 15:12:39 +10:00
E M
df79cb097b
refactor polling loop
2026-04-24 15:01:45 +10:00
E M
a72e933d38
temp comment out releasee workflow
2026-04-24 14:41:55 +10:00
E M
df25b12356
temp comment out build workflow
2026-04-24 14:40:57 +10:00
E M
c114a54851
temp comment out build to make testing ci changes faster
2026-04-24 14:40:11 +10:00
E M
b05c345143
Keeps timing out waiting for start, so try polling loop
2026-04-24 14:37:56 +10:00
E M
9c0e749e99
wait for runners-ci node to be ready before continuing workflow
2026-04-24 13:28:01 +10:00
E M
7994d88996
reorder wait command flags
2026-04-24 12:35:40 +10:00
E M
8527868c45
Show storage node logs URL in workflow summary
2026-04-24 12:25:45 +10:00
E M
c85c658c48
move RELEASE_TESTS_GCP_PROJECT from secret to var for logging URL
2026-04-24 12:25:17 +10:00
E M
937f3c88c0
bump kubectl to latest
2026-04-24 12:24:56 +10:00