E M 8f13be1dc4
chore: reduce GKE release test cluster provisioning time and cost
- Configure runners-ci node pool inline in the cluster resource instead
  of using remove_default_node_pool=true, eliminating the
  provision-then-delete cycle that added ~5 min to terraform apply
- Remove the separate infra pool; runners-ci is now the only pool on
  the critical path of cluster creation
- Set tests-pods pool min_node_count=0 so no node is provisioned at
  apply time — nodes scale up only when test pods are scheduled
- Enable spot instances on the tests-pods pool for ~60-91% cost saving
- Add 60 min job timeout to release-tests to bound hung cluster cost
- Add Terraform plugin cache keyed on the lock file to skip provider
  re-downloads on subsequent runs (~30-60s saved)
- Install gke-gcloud-auth-plugin via setup-gcloud to fix kubectl auth

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 09:46:59 +10:00

42 lines
1.0 KiB
HCL

# Kubernetes cluster — runners-ci pool configured inline to avoid the
# remove_default_node_pool create-then-delete cycle that adds ~5 min.
resource "google_container_cluster" "this" {
name = local.name
location = var.zone
project = var.project
deletion_protection = false
release_channel {
channel = var.kubernetes_release_channel
}
# Enable Workload Identity
workload_identity_config {
workload_pool = "${var.project}.svc.id.goog"
}
# Send pod stdout/stderr to Cloud Logging automatically
logging_service = "logging.googleapis.com/kubernetes"
monitoring_service = "monitoring.googleapis.com/kubernetes"
node_pool {
name = var.node_pool_name
initial_node_count = var.node_pool_min
autoscaling {
min_node_count = var.node_pool_min
max_node_count = var.node_pool_max
}
node_config {
machine_type = var.node_pool_machine_type
labels = var.node_pool_labels
oauth_scopes = [
"https://www.googleapis.com/auth/cloud-platform",
]
}
}
}