106 Commits

Author SHA1 Message Date
Eric
d6a9b9239a
force pod spread (#130)
Added required pod anti-affinity (kubernetes.io/hostname, pod-uuid Exists) to K8sController.cs so all pods within a test namespace are forced onto distinct nodes (Pending if unsatisfiable) to cover the worst-case pod counts per namespace.

The anti-affinity TopologyKey of "kubernetes.io/hostname" specifies that the domain is the node, along with the "Exists" operator, specifies that "this Pod should not run on this node if this node is already running one or mode pods that have a "pod-uuid" key (they all do).
2026-06-16 22:17:38 +10:00
Eric
e8b2ad014b
feat: remove usage of PVC disks (#128)
* fix: delete PVCs after stopping containers

* Didn't work, instead try to delete all PVCs just before the namespace is deleted, after all pods destroyed.

* Didn't work, force kill pods, then delete pvcs

Force kill pods, wait for them to be killed. Then remove the pvc finaliser that protects the pvc from deletion. Finally, delete the pvc. The finaliser deletion step is there in case the force kill pod times out.

* try without waiting for pods to be killed

* prevent double delete race

* remove unneeded method, improve log output in pvc deletion

* Use emptyDir ephemeral volumes instead of PVCs

* fix dist tests workflow summary

After kubeconfig was replaced with an in-cluster service account, k8sClient was returning null and thus no test summaries were being written to ConfigMaps. This change returns a default Kubeconfig for the k8sClient when one is not passed in an environment variable.

* remove PVC volume deletion since PVCs are no longer created
2026-06-15 15:50:16 +10:00
E M
8895dc24bb bump k8s sdk version to support kubectl 1.35 2026-05-28 11:46:36 -03:00
E M
3c58ee3777 refactor: replace scheduling affinity with explicit node pool label selection
Replace the indirect `SetSchedulingAffinity(notIn: "false")` / `allow-tests-pods` mechanism with `ScheduleInPoolsWithLabel(key, value)` and `AddToleration(key, value, effect)` in ContainerRecipeFactory. This is much more readable from an API perspective. `SetSchedulingAffinity(notIn: "false")` was a double-negative (hard to reason about) and it was not clear that this was meant to schedule on pools with labels `allow-tests-pods=true`.

Previously, pods were steered to the spot node pool via a node affinity exclusion on a boolean label (`allow-tests-pods NotIn ["false"]`), and spot taint toleration was added implicitly by using the `system-node-critical` priority class. The priority class was removed earlier because it caused a ResourceQuota admission error in GCP, which silently broke spot node scheduling.

The new API is explicit: recipes call `ScheduleInPoolsWithLabel` to set a nodeSelector label that targets the intended pool, and `AddToleration` to declare any taints the pool carries. Tolerations are set at the recipe level to allow for the recipe to move back to Digital Ocean if needed (removing the unneeded toleration). All four recipes (storage, prometheus, discord bot, rewarder bot) now call both.

Cleanup applied alongside:
- `PodToleration` converted to a record for structural equality and simpler deduplication
- `ExposedPorts`, `InternalPorts`, `EnvVars`, `Volumes` on `ContainerRecipe` changed to
  `IReadOnlyList<T>` for consistent immutable typing
- `SetCriticalPriority` property renamed to `IsCriticalPriority`
- `GetPriorityClassName` returns `string?` instead of `null!`
- `Reset()` extracted in `ContainerRecipeFactory` to consolidate post-create state reset
- Fixed bug: `nodePoolLabels` and `tolerations` were passed by reference and then cleared,
  leaving the recipe with empty collections; now snapshotted before clearing
- `SchedulingAffinity.cs` deleted (no remaining callers)
2026-05-28 11:46:36 -03:00
E M
05e7914fc2 fix: "pod IP unknown" failures while pod still scheduling 2026-05-28 11:46:36 -03:00
Eric
13d453d5ed
chore: Docker updates to support release tests in logos-storage-nim, and remove Codex references (#124)
* ci(docker): build dist-tests images

* Update to .net 10, kubernetes client 18.0.13

Kubernetes client 18.0.13 is compatible with Kubernetes 1.34.x. The Kubernetes version is selected automatically by kubeadm in docker desktop (v1.34.1). See https://github.com/kubernetes-client/csharp#version-compatibility for a compatibility table.

* Updates to support Kubernetes upgrade

* bump openapi.yaml to match openapi.yaml in the logos-storage-nim docker image

* bump doc to .net 10

* bump docker to .net 10

* Build image with latest tag always

Always build an image with a latest tag (as well as a sha commit hash) when there's a push to master

* docker image tag as "latest" only when pushing to master

* Update docker image to install doctl

* Remove doctl install

kubeconfig is now created and uses a plain bearer token instead of using doctl as a credential mgr

* Rename and remove all instances of Codex

* Further remove CodexNetDeployer as it is no longer needed

---------

Co-authored-by: Adam Uhlíř <adam@uhlir.dev>
2026-04-17 15:03:22 +10:00
Ben
e03f5982d3
Requires new contracts image with configurable marketplace config 2025-08-25 11:11:56 +02:00
ThatBen
8817ce56ae
increases pod deployment timeout 2025-07-09 10:22:41 +02:00
ThatBen
c1fa309271
Applies long timesets for cluster runs 2025-06-20 08:40:48 +02:00
ThatBen
c98cf1ffc4
fixes naming for parameterised fixtures 2025-06-05 15:27:06 +02:00
Ben
6d02285d9a
Merge branch 'master' into feature/proofs-and-frees
# Conflicts:
#	Framework/KubernetesWorkflow/LogHandler.cs
#	Framework/Utils/EthAddress.cs
#	ProjectPlugins/CodexContractsPlugin/Marketplace/Marketplace.cs
#	Tests/CodexReleaseTests/MarketTests/FailTest.cs
#	Tests/CodexReleaseTests/MarketTests/FinishTest.cs
#	Tests/CodexReleaseTests/Utils/MarketplaceAutoBootstrapDistTest.cs
#	Tools/AutoClient/Modes/FolderStore/FileSaver.cs
2025-06-04 12:31:19 +02:00
ThatBen
9e9b147b68
wip log downloading 2025-05-20 14:16:33 +02:00
ThatBen
0e66e8e94a
Merge branch 'master' into feature/extended-marketplace-testing
# Conflicts:
#	Tests/CodexReleaseTests/Parallelism.cs
2025-05-20 09:00:33 +02:00
Ben
5e62c3520c
Prevents downloading of crash log in retry loop 2025-05-14 11:45:15 +02:00
ThatBen
76ea98e783
reverse disable pod ip 2025-04-29 12:03:13 +02:00
ThatBen
8d50c008b8
disables pod ips for cluster testing, fix when not at offsite 2025-04-28 15:44:46 +02:00
ThatBen
3bb9a29054
sets up multiple successfulcontract tests 2025-04-24 12:53:08 +02:00
Ben
a676e0463d
Sets up asserting of balances in case of failed contract. 2025-03-12 14:06:17 +01:00
Ben
fe75609ecb
Fix cid formatting in bot. Fix autoclient folder-uploader crash. Fix several openapi alignments. 2025-02-26 11:59:50 +01:00
ThatBen
92a5a1e361
fixes crashwatcher stop 2025-02-21 15:21:42 +01:00
ThatBen
0c961b8348
updates marketplace contract, fixes type conversions 2025-02-21 14:17:13 +01:00
ThatBen
c73fa186fc
wip 2025-01-16 13:24:57 +01:00
ThatBen
4a151880d4
Extracts codexClient assembly 2025-01-16 11:31:50 +01:00
ThatBen
ec644eed4a
Moving downloadedlog to logging module 2025-01-16 10:15:02 +01:00
ThatBen
e45ed0c21e
pushes container concepts out of codexAccess 2025-01-15 15:00:25 +01:00
Ben
e45b8bde54
adds container names to container log filenames 2024-12-09 15:57:10 +01:00
Ben
3e2cad3c17
Fixes issue where occassionally a CID contains 'ERR' and fails the log assert. 2024-12-04 15:57:49 +01:00
benbierens
04f087efe4
Bump to dotnet 8 2024-10-03 14:02:28 +02:00
Ben
ffb5eb294a
Fix kubernetes port name issue 2024-09-30 15:23:14 +02:00
Ben
c9fedac592
Adds call to get availabilities 2024-09-23 10:52:12 +02:00
Ben
ecd0e70261
fixes serialization issue of containerAdditionals 2024-08-21 10:45:17 +02:00
benbierens
d16b8cb011
Fixes identity issue for runningpod/runningcontainer and log saving for stopped containers 2024-08-01 10:39:06 +02:00
benbierens
9d9f65c5a3
Fixes missing name and null events 2024-07-29 11:02:24 +02:00
benbierens
87271f4f37
Sets up starting event and bootstrap event 2024-07-29 10:16:37 +02:00
Ben
5ca135646a
Container crash detection during start-up 2024-06-19 10:39:14 +02:00
benbierens
54f053cfcc
Improves crash detection 2024-06-14 09:05:56 +02:00
Ben
4fc9835f43
Attempt to log disk space before and after uploads and downloads 2024-06-12 10:48:52 +02:00
benbierens
aa416d50b3
ensuring enough mounted disk space 2024-06-08 10:36:23 +02:00
benbierens
3a61fc89c6
Adds WaitForCleanup test attribute to allow tests to wait for resources to be cleaned up 2024-06-06 15:09:52 +02:00
Ben
0ec43a9325
Shuts up json-serialization codex logging 2024-05-07 11:04:32 +02:00
benbierens
e187bfc941
Changes time.retry to fixed timelength instead of fixed number of retries 2024-05-02 08:41:20 +02:00
benbierens
015d8da21d
Better logging for Time.WaitUntil. 2024-04-14 09:22:55 +02:00
benbierens
d847c4f3ec
Adds names to kube wait functions 2024-04-14 08:56:22 +02:00
gmega
e3b16fd742
add ability to stop single containers 2024-04-13 17:12:14 +03:00
benbierens
69666d3fee
Setting up future-containers 2024-04-09 09:30:45 +02:00
Ben
c5fb066c75
Allows for non-blocking stop of containers 2024-03-13 10:57:26 +01:00
Ben
e42f1ddbd7
Adds support for command overrides to container recipes. 2024-03-13 10:01:37 +01:00
benbierens
5dc918287c
Merge branch 'master' into feature/public-testnet-deploying 2023-12-11 08:30:25 +01:00
Slava
4c46a708ab
Change affinity label (#85)
https://github.com/codex-storage/infra-codex/issues/100
2023-11-27 19:06:04 +02:00
benbierens
3761e236a3
Sets node critical priority for codex and geth nodes 2023-11-23 14:50:54 +01:00