logos-storage-nim

mirror of https://github.com/logos-storage/logos-storage-nim.git synced 2026-08-03 07:23:20 +00:00

Author	SHA1	Message	Date
E M	0965220c6d	hanging at 64% deploying again, trying zone c	2026-04-29 17:48:54 +10:00
E M	ebf0abb35c	fix encoding of logging url	2026-04-29 17:13:57 +10:00
E M	a4750824bc	move back to europe-west4-b zone due to exhausted quota	2026-04-29 17:13:44 +10:00
E M	4fd5bdca92	refactor: remove allow-tests-pods node label from GKE node pools The `allow-tests-pods` boolean label was used by the test framework to steer pods away from runner nodes via a node affinity exclusion. Pod scheduling now uses the existing `workload-type` label directly as a nodeSelector, making the boolean label redundant.	2026-04-29 16:46:24 +10:00
E M	b7b01f92e8	Logging URL filters by RUNID instead of namespace/container name	2026-04-29 15:20:44 +10:00
E M	f5150109f7	fix: avoid building in parallel Avoids "file in use" errors while building in CI	2026-04-28 20:02:00 +10:00
E M	970012aa04	remove unneeded priority request	2026-04-28 18:07:43 +10:00
E M	6c86e8a9ed	set cluster creation timeout to 20mins temporary timeout so we can see if the latest commits work without waiting too long between tries	2026-04-28 17:57:15 +10:00
E M	11cb97e97d	Try changing zones in case the cluster deployment stall is due to a zonal unavailability.	2026-04-28 17:51:01 +10:00
E M	6bc28b68d7	change monitoring to default service Cluster deployment seems to be stalling because the metrics service is not started. So returning it to default to see if that fixes the issue.	2026-04-28 17:50:43 +10:00
E M	f2b26ae5eb	inline node pools so they can be created in parallel speeds up cluster creation	2026-04-28 16:46:22 +10:00
E M	70ae988c9b	remove unneeded setup	2026-04-28 16:28:04 +10:00
E M	9f46e1ce8a	move state bucket from gh secret to variable	2026-04-24 17:20:03 +10:00
E M	93fc629706	create the terraform cache dir first	2026-04-24 17:11:54 +10:00
E M	0839bd0301	add debug output	2026-04-24 17:01:52 +10:00
E M	580a424086	change script so it doesn't non-zero exit when no pods exist	2026-04-24 15:45:47 +10:00
E M	ffde5e0fdc	fix terraform cache, should remove warning	2026-04-24 15:33:32 +10:00
E M	aebe3a4262	fix polling script	2026-04-24 15:30:56 +10:00
E M	073dc7c408	check pod phase instead	2026-04-24 15:12:39 +10:00
E M	df79cb097b	refactor polling loop	2026-04-24 15:01:45 +10:00
E M	a72e933d38	temp comment out releasee workflow	2026-04-24 14:41:55 +10:00
E M	df25b12356	temp comment out build workflow	2026-04-24 14:40:57 +10:00
E M	c114a54851	temp comment out build to make testing ci changes faster	2026-04-24 14:40:11 +10:00
E M	b05c345143	Keeps timing out waiting for start, so try polling loop	2026-04-24 14:37:56 +10:00
E M	9c0e749e99	wait for runners-ci node to be ready before continuing workflow	2026-04-24 13:28:01 +10:00
E M	7994d88996	reorder wait command flags	2026-04-24 12:35:40 +10:00
E M	8527868c45	Show storage node logs URL in workflow summary	2026-04-24 12:25:45 +10:00
E M	c85c658c48	move RELEASE_TESTS_GCP_PROJECT from secret to var for logging URL	2026-04-24 12:25:17 +10:00
E M	937f3c88c0	bump kubectl to latest	2026-04-24 12:24:56 +10:00
E M	627d795e67	change pod condition to wait for (create)	2026-04-24 12:01:27 +10:00
E M	4e8c781299	reusable workflow outputs can silently fail to propagate in certain conditions Now, STORAGEDOCKERIMAGE is: - logosstorage/logos-storage-nim:latest-dist-tests for workflow_dispatch on a branch - logosstorage/logos-storage-nim:v0.1.8-dist-tests for a v0.1.8 tag push	2026-04-24 10:27:59 +10:00
E M	5c22e5d7bd	wait for an existing pod before completing step	2026-04-24 10:27:12 +10:00
E M	8f13be1dc4	chore: reduce GKE release test cluster provisioning time and cost - Configure runners-ci node pool inline in the cluster resource instead of using remove_default_node_pool=true, eliminating the provision-then-delete cycle that added ~5 min to terraform apply - Remove the separate infra pool; runners-ci is now the only pool on the critical path of cluster creation - Set tests-pods pool min_node_count=0 so no node is provisioned at apply time — nodes scale up only when test pods are scheduled - Enable spot instances on the tests-pods pool for ~60-91% cost saving - Add 60 min job timeout to release-tests to bound hung cluster cost - Add Terraform plugin cache keyed on the lock file to skip provider re-downloads on subsequent runs (~30-60s saved) - Install gke-gcloud-auth-plugin via setup-gcloud to fix kubectl auth Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 09:46:59 +10:00
E M	00a6264030	chore: use zonal GKE cluster to reduce provisioning time Switch cluster and all node pools from regional to zonal (`europe-west4-b`) to avoid the 40+ minute provisioning time of a regional (multi-zone) cluster. Adds a `zone` variable to the GKE module and cluster config, and updates the workflow's `gcloud get-credentials` call to use `--zone` instead of `--region`.	2026-04-23 17:10:08 +10:00
E M	229cff0065	ignore claude files	2026-04-23 16:01:29 +10:00
E M	9a45095a98	rename cluster to match previous change	2026-04-23 16:01:28 +10:00
E M	5cb9cc3379	reduce length of cluster name	2026-04-23 16:01:28 +10:00
E M	1bc1336268	add permissions to gcp auth	2026-04-23 16:01:28 +10:00
E M	5da037fb11	Port terraform cluster creation/destruction from digital ocean to gcp	2026-04-23 16:01:28 +10:00
E M	661308deb1	chore: rename Codex references to Logos Storage in release tests Replace all "Codex" branding in the release test workflow and supporting files: rename the K8s cluster, Terraform state key, secret, log paths, env var (CODEXDOCKERIMAGE → STORAGEDOCKERIMAGE), and test runner image (cs-codex-dist-tests → logos-storage-dist-tests) to align with the already-updated logos-storage-nim-cs-dist-tests repo in https://github.com/logos-storage/logos-storage-nim-cs-dist-tests/pull/124. Also fix the dotnet test path to the correct Tests/LogosStorageReleaseTests directory.	2026-04-23 16:01:28 +10:00
E M	117bc74099	Update workflow success condition	2026-04-23 16:01:27 +10:00
E M	c6a9320648	export kubeconfig values so template works	2026-04-23 16:01:27 +10:00
E M	fc50479c1e	Create static kubeconfig with bearer token Replace the use of doctl as a credential manager for executing k8s calls with a freshly created bearer token (expires after 2h). Avoids passing a DO personal access token to the cs-dist-tests runner pod.	2026-04-23 16:01:27 +10:00
E M	fdb47887d2	restore using latest cs-dist-tests image	2026-04-23 16:01:27 +10:00
E M	2b13ed9c07	WIP update cs-dist-tests docker image tag The change to the cs-dist-tests image name was to test if installing doctl to the image would fix the release tests not being able to authenticate into the cluster. This is mainly due to the kubeconfig being generated and stored in a DO secret, as opposed to a static kubeconfig for a permanent cluster as before. IMPORTANT: The image tag should be changed back to 'latest'!	2026-04-23 16:01:27 +10:00
E M	122ad42038	wait for pod to start before streaming logs	2026-04-23 16:01:26 +10:00
E M	df880b959b	prefix repository secret names with RELEASE_TESTS_	2026-04-23 16:01:26 +10:00
E M	fc9923a0a4	Allow workflows to be tested manually by branch	2026-04-23 16:01:26 +10:00
E M	90d96a1a01	update workflow inputs, k8s namespace - remove RUNNER_IMAGE because the cs-dist-tests image is dumb -- it clones the cs-dist-tests repo, checkouts the branch in BRANCH and then runs the release tests. So instead, always use the :latest image (which is built when there are commits to master) - add the BRANCH workflow input so you can test cs-dist-test changes in the runner if needed - remove COMMAND arg, it's always going to be 'dotnet test Tests/CodexReleaseTests' - remove NAMESPACE env variable and just use 'default'. The cluster is ephermal and so all resources deployed are for the release tests, no namespaces needed.	2026-04-23 16:01:26 +10:00
E M	271d2aec51	Add workflow input for cs-dist-tests docker image By default, the logosstorage/logos-storage-nim-cs-dist-tests:latest image will be used for the test runner in the release tests. However, if developers want to run the release tests and test changes to the runner (eg changes in the logos-storage-nim-cs-dist-tests repo), they can push their changes to a branch and manually run the `docker-runner` workflow in the logos-storage-nim-cs-dist-tests repo. This will create an image like logosstorage/logos-storage-nim-cs-dist-tests:sha-c0465a5. This image can then be used as a release tests workflow input for 'cs-dist-tests runner image'	2026-04-23 16:01:26 +10:00

1 2 3 4 5 ...

960 Commits