9 Commits

Author SHA1 Message Date
E M
37f14a6821
Move from a single zone to multiple zones to increase spot instance availability 2026-05-28 15:41:28 +10:00
E M
f72dbb9c9d
cap boot drive size to 20gb (default is 100gb) to avoid resource exhaustion 2026-05-28 15:41:28 +10:00
E M
fd5c29db31
set cluster creation timeout to 20mins
temporary timeout so we can see if the latest commits work without waiting too long between tries
2026-05-28 15:41:27 +10:00
E M
5616b50bfb
change monitoring to default service
Cluster deployment seems to be stalling because the metrics service is not started. So returning it to default to see if that fixes the issue.
2026-05-28 15:41:27 +10:00
E M
7d6701d444
inline node pools so they can be created in parallel
speeds up cluster creation
2026-05-28 15:41:27 +10:00
E M
bbc4b1caf3
remove unneeded setup 2026-05-28 15:41:27 +10:00
E M
bc7a277d9b
chore: reduce GKE release test cluster provisioning time and cost
- Configure runners-ci node pool inline in the cluster resource instead
  of using remove_default_node_pool=true, eliminating the
  provision-then-delete cycle that added ~5 min to terraform apply
- Remove the separate infra pool; runners-ci is now the only pool on
  the critical path of cluster creation
- Set tests-pods pool min_node_count=0 so no node is provisioned at
  apply time — nodes scale up only when test pods are scheduled
- Enable spot instances on the tests-pods pool for ~60-91% cost saving
- Add 60 min job timeout to release-tests to bound hung cluster cost
- Add Terraform plugin cache keyed on the lock file to skip provider
  re-downloads on subsequent runs (~30-60s saved)
- Install gke-gcloud-auth-plugin via setup-gcloud to fix kubectl auth

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 15:41:25 +10:00
E M
cf6d93f52c
chore: use zonal GKE cluster to reduce provisioning time
Switch cluster and all node pools from regional to zonal (`europe-west4-b`) to avoid the 40+ minute provisioning time of a regional (multi-zone) cluster. Adds a `zone` variable to the GKE module and cluster config, and updates the workflow's `gcloud get-credentials` call to use `--zone` instead of `--region`.
2026-05-28 15:41:25 +10:00
E M
7c74437bb7
Port terraform cluster creation/destruction from digital ocean to gcp 2026-05-28 15:41:24 +10:00