52 Commits

Author SHA1 Message Date
E M
5f2b537fcd
refactor: replace scheduling affinity with explicit node pool label selection
Replace the indirect `SetSchedulingAffinity(notIn: "false")` / `allow-tests-pods` mechanism with `ScheduleInPoolsWithLabel(key, value)` and `AddToleration(key, value, effect)` in ContainerRecipeFactory. This is much more readable from an API perspective. `SetSchedulingAffinity(notIn: "false")` was a double-negative (hard to reason about) and it was not clear that this was meant to schedule on pools with labels `allow-tests-pods=true`.

Previously, pods were steered to the spot node pool via a node affinity exclusion on a boolean label (`allow-tests-pods NotIn ["false"]`), and spot taint toleration was added implicitly by using the `system-node-critical` priority class. The priority class was removed earlier because it caused a ResourceQuota admission error in GCP, which silently broke spot node scheduling.

The new API is explicit: recipes call `ScheduleInPoolsWithLabel` to set a nodeSelector label that targets the intended pool, and `AddToleration` to declare any taints the pool carries. Tolerations are set at the recipe level to allow for the recipe to move back to Digital Ocean if needed (removing the unneeded toleration). All four recipes (storage, prometheus, discord bot, rewarder bot) now call both.

Cleanup applied alongside:
- `PodToleration` converted to a record for structural equality and simpler deduplication
- `ExposedPorts`, `InternalPorts`, `EnvVars`, `Volumes` on `ContainerRecipe` changed to
  `IReadOnlyList<T>` for consistent immutable typing
- `SetCriticalPriority` property renamed to `IsCriticalPriority`
- `GetPriorityClassName` returns `string?` instead of `null!`
- `Reset()` extracted in `ContainerRecipeFactory` to consolidate post-create state reset
- Fixed bug: `nodePoolLabels` and `tolerations` were passed by reference and then cleared,
  leaving the recipe with empty collections; now snapshotted before clearing
- `SchedulingAffinity.cs` deleted (no remaining callers)
2026-04-29 16:45:55 +10:00
E M
5b17395380
fix: "pod IP unknown" failures while pod still scheduling 2026-04-28 21:56:25 +10:00
Eric
13d453d5ed
chore: Docker updates to support release tests in logos-storage-nim, and remove Codex references (#124)
* ci(docker): build dist-tests images

* Update to .net 10, kubernetes client 18.0.13

Kubernetes client 18.0.13 is compatible with Kubernetes 1.34.x. The Kubernetes version is selected automatically by kubeadm in docker desktop (v1.34.1). See https://github.com/kubernetes-client/csharp#version-compatibility for a compatibility table.

* Updates to support Kubernetes upgrade

* bump openapi.yaml to match openapi.yaml in the logos-storage-nim docker image

* bump doc to .net 10

* bump docker to .net 10

* Build image with latest tag always

Always build an image with a latest tag (as well as a sha commit hash) when there's a push to master

* docker image tag as "latest" only when pushing to master

* Update docker image to install doctl

* Remove doctl install

kubeconfig is now created and uses a plain bearer token instead of using doctl as a credential mgr

* Rename and remove all instances of Codex

* Further remove CodexNetDeployer as it is no longer needed

---------

Co-authored-by: Adam Uhlíř <adam@uhlir.dev>
2026-04-17 15:03:22 +10:00
ThatBen
8817ce56ae
increases pod deployment timeout 2025-07-09 10:22:41 +02:00
Ben
fe75609ecb
Fix cid formatting in bot. Fix autoclient folder-uploader crash. Fix several openapi alignments. 2025-02-26 11:59:50 +01:00
ThatBen
92a5a1e361
fixes crashwatcher stop 2025-02-21 15:21:42 +01:00
ThatBen
e45ed0c21e
pushes container concepts out of codexAccess 2025-01-15 15:00:25 +01:00
Ben
e45b8bde54
adds container names to container log filenames 2024-12-09 15:57:10 +01:00
Ben
5ca135646a
Container crash detection during start-up 2024-06-19 10:39:14 +02:00
benbierens
aa416d50b3
ensuring enough mounted disk space 2024-06-08 10:36:23 +02:00
benbierens
3a61fc89c6
Adds WaitForCleanup test attribute to allow tests to wait for resources to be cleaned up 2024-06-06 15:09:52 +02:00
benbierens
e187bfc941
Changes time.retry to fixed timelength instead of fixed number of retries 2024-05-02 08:41:20 +02:00
benbierens
015d8da21d
Better logging for Time.WaitUntil. 2024-04-14 09:22:55 +02:00
benbierens
d847c4f3ec
Adds names to kube wait functions 2024-04-14 08:56:22 +02:00
gmega
e3b16fd742
add ability to stop single containers 2024-04-13 17:12:14 +03:00
benbierens
69666d3fee
Setting up future-containers 2024-04-09 09:30:45 +02:00
Ben
c5fb066c75
Allows for non-blocking stop of containers 2024-03-13 10:57:26 +01:00
Ben
e42f1ddbd7
Adds support for command overrides to container recipes. 2024-03-13 10:01:37 +01:00
benbierens
5dc918287c
Merge branch 'master' into feature/public-testnet-deploying 2023-12-11 08:30:25 +01:00
Slava
4c46a708ab
Change affinity label (#85)
https://github.com/codex-storage/infra-codex/issues/100
2023-11-27 19:06:04 +02:00
benbierens
3761e236a3
Sets node critical priority for codex and geth nodes 2023-11-23 14:50:54 +01:00
benbierens
ff52e8e841
Defines public IP fetching service 2023-11-23 14:15:37 +01:00
benbierens
485e3cf02e
Merge branch 'master' into feature/public-testnet-deploying 2023-11-14 10:50:41 +01:00
benbierens
b47b596062
Fetches used external ports in order to guarantee no collisions. 2023-11-14 10:49:14 +01:00
benbierens
7de0e5a1c4
Sets up tests-runners as avoided scheduling affinity. 2023-11-14 10:16:00 +01:00
benbierens
5a7608460b
retry by setting nodeport explicitly 2023-11-13 13:58:34 +01:00
benbierens
ed56d9edcc
Cleanup of kubernetesWorkflow assembly. 2023-11-12 10:07:23 +01:00
benbierens
096282ae1a
Fixes debug/peer serialization. Adds retry for pod-finding. 2023-11-10 15:23:16 +01:00
benbierens
ec6c987ef9
determine runner location with environment variables 2023-11-07 15:51:28 +01:00
benbierens
6e60a8614c
Removes runnerlocation field from StartResult. 2023-11-07 13:06:49 +01:00
benbierens
eecdcf308d
Rewrites runner location set 2023-11-07 12:48:18 +01:00
benbierens
ad88560061
Creates internal service ports for external ports from container recipe. 2023-11-06 16:38:32 +01:00
benbierens
8d0b3feff7
logs runner location 2023-11-06 16:12:18 +01:00
benbierens
6672427565
Restores marketplace test 2023-11-06 16:10:19 +01:00
benbierens
dc9f3ab090
removes dependency on static pod name and address info 2023-11-06 14:33:47 +01:00
benbierens
db0a21bc60
Merge branch 'automated-teststarter' into deployment-json-rework 2023-11-03 14:57:32 +01:00
benbierens
19d466d5d6
better service names 2023-11-03 14:53:27 +01:00
benbierens
bc51fc2e30
Fixes faulty persistent volume claim creation 2023-11-02 11:32:24 +01:00
benbierens
fcb5a527a9
Fixes pod-readback labeling issue. 2023-10-31 15:04:59 +01:00
benbierens
a6f7bc2393
Removes knownPods class. 2023-10-31 14:48:16 +01:00
benbierens
ac07327d77
Upgrades volume support for use with deploy-and-run container 2023-10-31 14:20:50 +01:00
benbierens
b5e5570145
Creates volume mount for kubeconfig file 2023-10-31 11:38:54 +01:00
benbierens
9b1ab3185f
Idiot developer forgets to use variable. 2023-10-25 14:42:53 +02:00
benbierens
3914d58a6a
Sets up support for UDP ports and applies them to discovery ports. 2023-10-25 14:23:07 +02:00
benbierens
2fae9505d6
attempt to fix contract deployment in testnet mode 2023-10-23 15:28:20 +02:00
benbierens
0fd6a6f06e
Fixes port tag mismatch 2023-10-19 15:48:49 +02:00
benbierens
2fea475237
multiple service ports 2023-10-19 14:03:36 +02:00
benbierens
1ca943b189
Fixes partial rename in k8sController. 2023-10-04 10:37:18 +02:00
benbierens
562f886e30
Bumps k8s operation timeout for continuous test runner. 2023-10-04 09:26:11 +02:00
benbierens
10697f1047
Updates container location support 2023-09-25 08:47:19 +02:00