Chrysostomos Nanakos 0c8e28fa46
feat(k8s): add Vector logging infrastructure for benchmarks
Add Vector agent/aggregator deployment for collecting logs from Codex
benchmark experiments in K8s. Includes PVC for log storage, S3 secret
template and RBAC.

Vector collects logs from benchmark pods and writes JSONL files for
post-processing by the log-parsing workflow.

Signed-off-by: Chrysostomos Nanakos <chris@include.gr>
2025-10-21 13:13:49 +03:00

97 lines
2.7 KiB
Markdown

# Benchmark Cluster Infrastructure
This directory contains Kubernetes manifests for the benchmark cluster's logging infrastructure.
## Overview
### Vector Log Collection
Vector is deployed as a distributed logging system to collect logs from Codex benchmark experiments:
**Vector Agent (DaemonSet):**
- Collects logs from benchmark pods (filtered by label: `app.kubernetes.io/name=codex-benchmarks`)
- Forwards compressed logs to Vector Aggregator via port 6000
**Vector Aggregator:**
- Receives logs from all Vector agents
- Writes consolidated logs to PVC as JSONL files: `/vector-logs/benchmarks-YYYY-MM-DD.jsonl`
- Used by log-parsing Argo workflow for post-experiment processing
**Persistent Volume:**
- Stores collected JSONL logs
- Mounted by Vector Aggregator and log-parsing workflow
- Allows logs to persist between workflow runs
**S3 Secret:**
- Credentials for uploading processed logs to S3-compatible storage
- Used by log-parsing workflow's tar-and-upload step
**RBAC:**
- Service account and cluster role for Vector to access Kubernetes API
- Required for reading pod logs cluster-wide
## Installation Order
### 1. Create namespace (if not exists)
```bash
kubectl create namespace argo
```
### 2. Apply Vector components
```bash
kubectl apply -f vector/vector-pvc.yaml -n argo
kubectl apply -f vector/vector-agent-configmap.yaml -n argo
kubectl apply -f vector/vector-configmap.yaml -n argo
kubectl apply -f vector/vector-aggregator-configmap.yaml -n argo
kubectl apply -f vector/vector-deployment.yaml -n argo
kubectl apply -f vector/vector-aggregator-deployment.yaml -n argo
```
### 3. Configure S3 access
```bash
# Edit s3-secret.yaml with your credentials first
kubectl apply -f s3-secret.yaml -n argo
```
### 4. Configure Vector RBAC
```bash
kubectl apply -f vector/vector-aggregator-rbac.yaml -n argo
```
## Verification
### Check Vector Agent status
```bash
kubectl get daemonset -n argo | grep vector
kubectl get pods -n argo -l app.kubernetes.io/name=vector
```
### Check Vector Aggregator status
```bash
kubectl get deployment -n argo | grep vector-aggregator
```
### Check PVC status
```bash
kubectl get pvc -n argo vector-logs-pvc
```
## Troubleshooting
### Vector Agent not collecting logs
- Verify pod labels: `kubectl get pods -n codex-benchmarks --show-labels`
- Check agent logs: `kubectl logs -n argo -l app.kubernetes.io/name=vector`
- Ensure RBAC is applied: `kubectl get clusterrole vector-agent`
### Logs not appearing in PVC
- Check aggregator connection: `kubectl logs -n argo deployment/vector-aggregator | grep error`
- Verify PVC is mounted: `kubectl describe pod -n argo <aggregator-pod>`
### S3 upload failures
- Verify secret exists: `kubectl get secret -n argo s3-codex-benchmarks`
- Check credentials are correct (not placeholders)