logos-storage-nim

mirror of https://github.com/logos-storage/logos-storage-nim.git synced 2026-05-12 06:19:33 +00:00

Author	SHA1	Message	Date
E M	805ae86268	don't wait for pvc disks to be deleted, delete all at end in case runner crashes	2026-05-01 12:45:19 +10:00
E M	c0b1af52ca	increase memory of the runner pod was seeing exit code 137 (OOM)	2026-05-01 11:37:11 +10:00
E M	a9cb218d59	Keep 1 node alive so autoscaler doesn't scale to 0 Should help speed up startup, avoiding errors like "pod couldn't be scheduled"	2026-05-01 11:36:53 +10:00
E M	358c03c05e	wait for pvcs to be deleted before destroying the cluster	2026-05-01 11:08:00 +10:00
E M	c470b3e102	use on demand VMs instead of spot instances Attempting to fix a lot of errors in the console relating to spot instances being unschedulable.	2026-04-30 18:35:46 +10:00
E M	f140476d65	do not generate test summary if previous steps were skipped/cancelled	2026-04-30 18:34:55 +10:00
E M	0d74299a57	fix error in "print storage logs url" step	2026-04-30 18:34:26 +10:00
E M	185aa06514	delete terraform state lock When the workflow is cancelled, either manually, or automatically from a long-running step (timeout), the terraform state lock had to be manually deleted, or else the next workflow run would never succeed. This change ensures that the state lock file is always deleted after each run.	2026-04-30 18:15:57 +10:00
E M	94afc921e0	generate test summary to show in workflow summary	2026-04-30 18:09:35 +10:00
E M	2872a2800f	Move from a single zone to multiple zones to increase spot instance availability	2026-04-30 17:59:36 +10:00
E M	c889c0283f	Reduce nodes in pool from 10 to 5 Reduces resource contention. 2 parallel tests x 10 containers => 2-3 nodes needed, 5 gives room	2026-04-30 17:58:56 +10:00
E M	54075c576c	put cluster name in an env var	2026-04-30 15:32:40 +10:00
E M	4dc2fb9a79	avoid sleeping a full 60s to wait for job completion Instead, wait for a job condition using kubectl wait	2026-04-30 15:23:01 +10:00
E M	abaad73465	try to ensure the log stream survives long silences	2026-04-30 12:57:05 +10:00
E M	b78c2d5301	add starttime param to logging URL	2026-04-29 21:26:57 +10:00
E M	5da74edda0	cap boot drive size to 20gb (default is 100gb) to avoid resource exhaustion	2026-04-29 21:26:47 +10:00
E M	9e732b16b9	Add a "Delete PVCs before cluster teardown" step to the workflow to prevent future PVC leaks	2026-04-29 20:37:18 +10:00
E M	c177225677	try zone a one more time	2026-04-29 18:35:08 +10:00
E M	0965220c6d	hanging at 64% deploying again, trying zone c	2026-04-29 17:48:54 +10:00
E M	ebf0abb35c	fix encoding of logging url	2026-04-29 17:13:57 +10:00
E M	a4750824bc	move back to europe-west4-b zone due to exhausted quota	2026-04-29 17:13:44 +10:00
E M	4fd5bdca92	refactor: remove allow-tests-pods node label from GKE node pools The `allow-tests-pods` boolean label was used by the test framework to steer pods away from runner nodes via a node affinity exclusion. Pod scheduling now uses the existing `workload-type` label directly as a nodeSelector, making the boolean label redundant.	2026-04-29 16:46:24 +10:00
E M	b7b01f92e8	Logging URL filters by RUNID instead of namespace/container name	2026-04-29 15:20:44 +10:00
E M	f5150109f7	fix: avoid building in parallel Avoids "file in use" errors while building in CI	2026-04-28 20:02:00 +10:00
E M	970012aa04	remove unneeded priority request	2026-04-28 18:07:43 +10:00
E M	6c86e8a9ed	set cluster creation timeout to 20mins temporary timeout so we can see if the latest commits work without waiting too long between tries	2026-04-28 17:57:15 +10:00
E M	11cb97e97d	Try changing zones in case the cluster deployment stall is due to a zonal unavailability.	2026-04-28 17:51:01 +10:00
E M	6bc28b68d7	change monitoring to default service Cluster deployment seems to be stalling because the metrics service is not started. So returning it to default to see if that fixes the issue.	2026-04-28 17:50:43 +10:00
E M	f2b26ae5eb	inline node pools so they can be created in parallel speeds up cluster creation	2026-04-28 16:46:22 +10:00
E M	70ae988c9b	remove unneeded setup	2026-04-28 16:28:04 +10:00
E M	9f46e1ce8a	move state bucket from gh secret to variable	2026-04-24 17:20:03 +10:00
E M	93fc629706	create the terraform cache dir first	2026-04-24 17:11:54 +10:00
E M	0839bd0301	add debug output	2026-04-24 17:01:52 +10:00
E M	580a424086	change script so it doesn't non-zero exit when no pods exist	2026-04-24 15:45:47 +10:00
E M	ffde5e0fdc	fix terraform cache, should remove warning	2026-04-24 15:33:32 +10:00
E M	aebe3a4262	fix polling script	2026-04-24 15:30:56 +10:00
E M	073dc7c408	check pod phase instead	2026-04-24 15:12:39 +10:00
E M	df79cb097b	refactor polling loop	2026-04-24 15:01:45 +10:00
E M	a72e933d38	temp comment out releasee workflow	2026-04-24 14:41:55 +10:00
E M	df25b12356	temp comment out build workflow	2026-04-24 14:40:57 +10:00
E M	c114a54851	temp comment out build to make testing ci changes faster	2026-04-24 14:40:11 +10:00
E M	b05c345143	Keeps timing out waiting for start, so try polling loop	2026-04-24 14:37:56 +10:00
E M	9c0e749e99	wait for runners-ci node to be ready before continuing workflow	2026-04-24 13:28:01 +10:00
E M	7994d88996	reorder wait command flags	2026-04-24 12:35:40 +10:00
E M	8527868c45	Show storage node logs URL in workflow summary	2026-04-24 12:25:45 +10:00
E M	c85c658c48	move RELEASE_TESTS_GCP_PROJECT from secret to var for logging URL	2026-04-24 12:25:17 +10:00
E M	937f3c88c0	bump kubectl to latest	2026-04-24 12:24:56 +10:00
E M	627d795e67	change pod condition to wait for (create)	2026-04-24 12:01:27 +10:00
E M	4e8c781299	reusable workflow outputs can silently fail to propagate in certain conditions Now, STORAGEDOCKERIMAGE is: - logosstorage/logos-storage-nim:latest-dist-tests for workflow_dispatch on a branch - logosstorage/logos-storage-nim:v0.1.8-dist-tests for a v0.1.8 tag push	2026-04-24 10:27:59 +10:00
E M	5c22e5d7bd	wait for an existing pod before completing step	2026-04-24 10:27:12 +10:00

1 2 3 4 5 ...

978 Commits