66 Commits

Author SHA1 Message Date
gmega
4cbb401d12
feat: add optional data removal with adjusted quotas 2025-06-09 19:59:00 -03:00
gmega
67ca362ee7
misc: minor refactor, add simple network perf test deploy 2025-04-16 12:56:34 -03:00
gmega
b2491c26f9
fix: fix workflow expressions 2025-02-27 18:49:48 -03:00
gmega
81cda58a9d
feat: add download speed plot, dedup experiment datasets 2025-02-27 18:47:36 -03:00
gmega
a366f04e7c
feat: allow re-running failed experiments from previous workflow runs 2025-02-25 12:14:15 -03:00
gmega
5a9543259b
feat: add support for region k8s annotations 2025-02-24 14:16:59 -03:00
gmega
8dbc3faed8
feat: add tunable parallelism 2025-02-23 11:33:49 -03:00
gmega
73219922f6
feat: add Codex chart values for cluster experiments 2025-02-20 12:16:05 -03:00
gmega
48e71a315a
feat: add support for setting the node tag in benchmark workflow 2025-02-20 12:14:49 -03:00
gmega
688091c965
feat: allow use of custom runner and node tags for Codex 2025-02-20 11:59:24 -03:00
gmega
a8c19364b7
fix: minikube env param in workflow 2025-02-20 10:21:45 -03:00
gmega
0d08814929
feat: generalize benchmark workflow to run Codex in addition to Deluge 2025-02-17 10:44:00 -03:00
gmega
38434f4590
fix container label for codex experiment runner 2025-02-14 15:59:59 -03:00
gmega
e8441b7bea
fix: respect logger increments even when stream returns less data than expected 2025-02-14 15:59:28 -03:00
gmega
f7adf878eb
feat: add memory parameter to Deluge values file 2025-02-14 14:30:56 -03:00
gmega
205f926f89
feat: add stable bootstrap node 2025-02-14 14:30:18 -03:00
gmega
f336df8da7
fix: adjust Codex logging cooldown, insert polling backoff on download completion, define default Codex experiment 2025-02-14 12:14:52 -03:00
gmega
68ee1bad87
feat: add working Codex helm chart 2025-02-14 11:00:17 -03:00
gmega
74ee71889e
feat: add Codex node and initial integration tests 2025-02-04 19:18:58 -03:00
gmega
99992d2e7e
fix: enable cleanup on failure by default 2025-02-03 15:46:26 -03:00
gmega
61f2172304
feat: add workflow for the final experiment 2025-01-30 11:48:09 -03:00
gmega
94893c0f93
fix: conditional expression for cleanup 2025-01-29 20:35:26 -03:00
gmega
a29c010e7a
feat: allow keeping pods around on failure, add optional log parsing at end of experiment run 2025-01-29 08:47:01 -03:00
gmega
7ed29ddb4c
fix: add RAM settings on deluge node 2025-01-28 20:33:13 -03:00
gmega
1b83f8047c
feat: update RBAC for codex workflows 2025-01-28 18:20:47 -03:00
gmega
ee67a92726
feat: grant codex runner permissions to launch subworkflows 2025-01-27 18:07:56 -03:00
gmega
ba1b93d77c
feat: add structured experiment iteration logs 2025-01-27 17:26:09 -03:00
gmega
90dda4f932
fix: add -C so tars do not include parent folders 2025-01-24 19:19:54 -03:00
gmega
4d4d06e7a9
feat: add log parsing workflow with upload to hetzner storage bucket 2025-01-24 18:28:28 -03:00
gmega
fdac384ad8
fix: add autoscaler eviction annotations to prevent pods from being relocated mid-experiment 2025-01-23 12:12:42 -03:00
gmega
a9b9fd8332
fix: quotation so argo does not screw up the value array 2025-01-23 08:06:43 -03:00
gmega
8096c9f4e0
feat: add ordering to parameter matrix expander 2025-01-22 17:12:46 -03:00
gmega
d70b87d2bb
fix: production values for Argo workflows and RBAC 2025-01-22 10:31:08 -03:00
gmega
aeb2f044c8
chore: remove leftover values from chart 2025-01-20 20:01:48 -03:00
gmega
882392bef2
fix: add missing parameters to cleanup hook 2025-01-20 18:41:11 -03:00
gmega
6ae5b1620f
chore: add missing EOL 2025-01-20 17:59:07 -03:00
gmega
7e07eda3c2
feat: allow running workflows from locally loaded images under Minikube 2025-01-20 17:57:21 -03:00
gmega
5a203fad18
chore: eliminate 5GB experiment for now 2025-01-20 15:29:27 -03:00
gmega
ab100c4841
feat: runnable experiment with working test runner and agents 2025-01-20 15:24:03 -03:00
gmega
94556d7a53
working deployment of agents on minikube 2025-01-20 11:39:43 -03:00
gmega
60fd274b18
feat: add node affinity/anti-affinity and storage class knobs to run this on a cluster 2025-01-15 11:52:32 -03:00
gmega
fc0630224f
fix: remove redundant group suffix from node ID 2025-01-10 16:31:12 -03:00
gmega
b505e7a3e1
fix: fix README link, add missing precommit config, bump ruff 2025-01-09 16:48:44 -03:00
gmega
bfabd1c4c8
feat: label components with /component label, use /name to refer to benchmark pods; add README 2025-01-09 09:27:21 -03:00
gmega
a4fe12e620
feat: add new Helm chart parameters to workflow 2025-01-08 16:43:01 -03:00
gmega
4d1eef9d53
feat: standardize labelling in Helm chart to facilitate log consumption 2025-01-08 15:10:10 -03:00
gmega
d417f55ffd
add config sketch for setting up vector on minikube 2025-01-07 18:59:19 -03:00
gmega
59f3a9a584
fix: remove useless sync point which was causing issues 2024-12-20 18:00:32 -03:00
gmega
470e9a989e
feat: add standard labels to chart resources to facilitate log querying 2024-12-20 14:09:54 -03:00
gmega
f3a66d9637
fix: workaround for broken Argo exit hooks 2024-12-20 07:51:58 -03:00