10 Commits

Author SHA1 Message Date
f597247f56 Rename vm.vultrlabs.dev → victoriametrics.vultrlabs.dev 2026-04-09 19:33:58 +00:00
bf6d62b9a8 Add VictoriaMetrics for historical metrics (Mar 13+)
- Single-node VM deployment with 200Gi NVMe, 2y retention
- Traefik IngressRoute at vm.vultrlabs.dev (TLS + basic auth)
- Backfill script: pulls vLLM/DCGM metrics from Mimir, writes to VM
- Retain StorageClass so historical data survives PVC deletion
- README with deployment + Grafana mixed-datasource instructions
2026-04-09 19:29:18 +00:00
7ade5ecac8 Clean slate: 1h block sizes, remove backfill artifacts
- Changed all namespace block sizes to 1h (was 2h/12h/24h in manifests,
  30d+ in the live cluster due to backfill-era bufferPast hacks)
- Deleted entire backfill/ directory (scripts, pods, runbooks)
- Removed stale 05-m3coordinator.yaml (had backfill namespaces)
- Added 05-m3coordinator-deployment.yaml to kustomization
- Fixed init job health check (/health instead of /api/v1/services/m3db/health)
- Updated .env.example (removed Mimir credentials)
- Added 'Why Backfill Doesn't Work' section to README
2026-04-09 19:00:08 +00:00
1af29e8f09 tweaks with backfill and grafana 2026-04-01 15:21:10 +00:00
a6c59d6a65 Replace LB with Traefik ingress for TLS + basic auth
- Remove m3coordinator LoadBalancer service (was using deprecated AutoSSL)
- Add Traefik ingress controller with Let's Encrypt ACME
- Add basic auth middleware for external access
- Update test scripts with auth support and fixed protobuf encoding
- Add multi-tenancy documentation (label-based isolation)
- Update README with Traefik deployment instructions
2026-04-01 05:19:14 +00:00
5eb58d1864 Update README with working m3db.vultrlabs.dev endpoint 2026-04-01 02:44:07 +00:00
5f4cd46bc3 Add backend-protocol annotation to m3coordinator-lb
LB now properly speaks HTTP to the coordinator backend
2026-04-01 02:43:41 +00:00
d35cd2d7d4 Update test scripts to accept full URL instead of LB_IP
- test-metrics.sh and test-metrics.py now take a full URL with port
- Supports both HTTP and HTTPS endpoints
- Updated README with new usage examples
2026-04-01 02:38:47 +00:00
a8469f79d7 Fix m3dbnode port conflict, update README, fix test script
- Remove duplicate db.metrics section (port 7203 conflict)
- Fix coordinator health endpoint (/health not /api/v1/services/m3db/health)
- Update README: remove NodePort references, always use LoadBalancer
- Add bootstrap instructions (workaround for init job chicken-and-egg)
- Fix test-metrics.sh: correct health endpoint and JSON parsing
2026-03-31 15:49:59 +00:00
ac13c30905 init commit 2026-03-31 08:28:16 -04:00