Files
m3db-vke-setup/victoriametrics/README.md

186 lines
6.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# VictoriaMetrics — Historical Metrics Store
VictoriaMetrics instance for querying historical vLLM + DCGM metrics (March 13, 2026 onward) that couldn't be backfilled into M3DB.
## Why VictoriaMetrics Instead of M3DB?
M3DB doesn't support backfill. Period. See the [main README](../README.md#why-backfill-doesnt-work) for the full story.
VictoriaMetrics has a first-class `/api/v1/import` endpoint that accepts data with any timestamp — no `bufferPast` gates, no block size hacks, no special namespaces. You just send the data and it works.
## Architecture
```
┌─────────────────────────────────────────────────┐
│ Vultr VKE Cluster │
│ │
Mimir ──import──▶ VictoriaMetrics (1 pod, 200Gi NVMe) │
│ ↓ PromQL queries │
│ Traefik (TLS + basic auth) │
│ ↓ │
│ victoriametrics.vultrlabs.dev │
└─────────────────────────────────────────────────┘
Grafana queries both:
- M3DB (m3db.vultrlabs.dev) → real-time data (1h blocks, going forward)
- VictoriaMetrics (victoriametrics.vultrlabs.dev) → historical data (Mar 13present)
```
## Quick Start
### 1. Deploy VictoriaMetrics
```bash
# Apply manifests
kubectl apply -k .
# Wait for pod to be running
kubectl -n victoriametrics get pods -w
# Verify it's healthy
kubectl -n victoriametrics port-forward svc/victoriametrics 8428:8428 &
curl http://localhost:8428/health
```
### 2. Configure DNS
Get the Traefik LoadBalancer IP and point `victoriametrics.vultrlabs.dev` at it:
```bash
kubectl -n traefik get svc traefik
```
### 3. Set Up Basic Auth
Generate htpasswd and update the secret in `04-basic-auth-middleware.yaml`:
```bash
htpasswd -nb vultr_vm <your-password>
# Copy output, base64 encode it:
echo -n '<htpasswd-output>' | base64
# Update the secret and apply
kubectl apply -f 04-basic-auth-middleware.yaml
```
### 4. Run Backfill
```bash
# Create the secret with Mimir credentials
kubectl create secret generic backfill-credentials \
--from-literal=mimir-password='YOUR_MIMIR_PASSWORD' -n victoriametrics
# Upload the backfill script as a configmap
kubectl create configmap backfill-script \
--from-file=backfill.py=backfill.py -n victoriametrics
# Run the backfill pod
kubectl apply -f backfill-pod.yaml
# Watch progress
kubectl logs -f backfill -n victoriametrics
# Cleanup when done
kubectl delete pod backfill -n victoriametrics
kubectl delete configmap backfill-script -n victoriametrics
kubectl delete secret backfill-credentials -n victoriametrics
```
### 5. Verify
```bash
# In-cluster
kubectl -n victoriametrics exec deploy/victoriametrics -- \
curl -s 'http://localhost:8428/api/v1/query?query=vllm:prompt_tokens_total' | python3 -m json.tool
# External (with auth)
curl -u vultr_vm:<password> "https://victoriametrics.vultrlabs.dev/api/v1/query?query=up"
```
## Grafana Configuration
Add VictoriaMetrics as a **Prometheus** datasource:
- **URL:** `https://victoriametrics.vultrlabs.dev` (with basic auth)
- **In-cluster URL:** `http://victoriametrics.victoriametrics.svc.cluster.local:8428`
### Mixed Queries (M3DB + VictoriaMetrics)
Use a **Mixed** datasource in Grafana to query both:
1. Create two Prometheus datasources:
- `M3DB``https://m3db.vultrlabs.dev`
- `VictoriaMetrics``https://victoriametrics.vultrlabs.dev`
2. Create a **Mixed** datasource that includes both
3. In dashboards, use the mixed datasource — Grafana sends the query to both backends and merges results
Alternatively, use dashboard variables to let users toggle between datasources for different time ranges.
## Metrics Stored
| Metric | Description |
|--------|-------------|
| `vllm:prompt_tokens_total` | vLLM prompt token count |
| `vllm:generation_tokens_total` | vLLM generation token count |
| `DCGM_FI_DEV_GPU_UTIL` | GPU utilization (DCGM) |
All metrics are tagged with `tenant=serverless-inference-cluster` and `cluster=serverless-inference-cluster`.
## VictoriaMetrics API Reference
| Endpoint | Purpose |
|----------|---------|
| `/api/v1/import` | Import data (Prometheus format) |
| `/api/v1/export` | Export data |
| `/api/v1/query` | PromQL instant query |
| `/api/v1/query_range` | PromQL range query |
| /health | Health check |
| /metrics | Internal metrics |
## Storage
- **Size:** 200Gi NVMe (Vultr Block Storage)
- **StorageClass:** `vultr-block-storage-vm` (Retain policy — data survives PVC deletion)
- **Retention:** 2 years
- **Volume expansion:** `kubectl edit pvc victoriametrics-data -n victoriametrics`
## Useful Commands
```bash
# Check VM health
kubectl -n victoriametrics exec deploy/victoriametrics -- curl -s http://localhost:8428/health
# Check storage stats
kubectl -n victoriametrics exec deploy/victoriametrics -- \
curl -s 'http://localhost:8428/api/v1/query?query=vm_rows' | python3 -m json.tool
# Query historical data
curl -u vultr_vm:<password> \
"https://victoriametrics.vultrlabs.dev/api/v1/query_range?query=vllm:prompt_tokens_total&start=1773360000&end=1742000000&step=60"
# Restart VM (if needed)
kubectl rollout restart deployment/victoriametrics -n victoriametrics
# Scale to 0 (preserve data, stop the pod)
kubectl scale deployment/victoriametrics --replicas=0 -n victoriametrics
```
## Re-running Backfill
If you need to import additional time ranges or new metrics:
1. Edit `backfill.py` — update `START_TS`, `END_TS`, or `METRICS`
2. Recreate the configmap and pod (see step 4 above)
3. VictoriaMetrics is idempotent for imports — duplicate data points are merged, not duplicated
To convert timestamps:
```bash
# Date → Unix timestamp
date -u -d '2026-03-13 00:00:00' +%s # 1773360000
# Unix timestamp → date
date -u -d @1773360000
```