VictoriaMetrics — Historical Metrics Store
VictoriaMetrics instance for querying historical vLLM + DCGM metrics (March 13, 2026 onward) that couldn't be backfilled into M3DB.
Why VictoriaMetrics Instead of M3DB?
M3DB doesn't support backfill. Period. See the main README for the full story.
VictoriaMetrics has a first-class /api/v1/import endpoint that accepts data with any timestamp — no bufferPast gates, no block size hacks, no special namespaces. You just send the data and it works.
Architecture
┌─────────────────────────────────────────────────┐
│ Vultr VKE Cluster │
│ │
Mimir ──import──▶ VictoriaMetrics (1 pod, 200Gi NVMe) │
│ ↓ PromQL queries │
│ Traefik (TLS + basic auth) │
│ ↓ │
│ victoriametrics.vultrlabs.dev │
└─────────────────────────────────────────────────┘
Grafana queries both:
- M3DB (m3db.vultrlabs.dev) → real-time data (1h blocks, going forward)
- VictoriaMetrics (victoriametrics.vultrlabs.dev) → historical data (Mar 13–present)
Quick Start
1. Deploy VictoriaMetrics
# Apply manifests
kubectl apply -k .
# Wait for pod to be running
kubectl -n victoriametrics get pods -w
# Verify it's healthy
kubectl -n victoriametrics port-forward svc/victoriametrics 8428:8428 &
curl http://localhost:8428/health
2. Configure DNS
Get the Traefik LoadBalancer IP and point victoriametrics.vultrlabs.dev at it:
kubectl -n traefik get svc traefik
3. Set Up Basic Auth
Generate htpasswd and update the secret in 04-basic-auth-middleware.yaml:
htpasswd -nb vultr_vm <your-password>
# Copy output, base64 encode it:
echo -n '<htpasswd-output>' | base64
# Update the secret and apply
kubectl apply -f 04-basic-auth-middleware.yaml
4. Run Backfill
# Create the secret with Mimir credentials
kubectl create secret generic backfill-credentials \
--from-literal=mimir-password='YOUR_MIMIR_PASSWORD' -n victoriametrics
# Upload the backfill script as a configmap
kubectl create configmap backfill-script \
--from-file=backfill.py=backfill.py -n victoriametrics
# Run the backfill pod
kubectl apply -f backfill-pod.yaml
# Watch progress
kubectl logs -f backfill -n victoriametrics
# Cleanup when done
kubectl delete pod backfill -n victoriametrics
kubectl delete configmap backfill-script -n victoriametrics
kubectl delete secret backfill-credentials -n victoriametrics
5. Verify
# In-cluster
kubectl -n victoriametrics exec deploy/victoriametrics -- \
curl -s 'http://localhost:8428/api/v1/query?query=vllm:prompt_tokens_total' | python3 -m json.tool
# External (with auth)
curl -u vultr_vm:<password> "https://victoriametrics.vultrlabs.dev/api/v1/query?query=up"
Grafana Configuration
Add VictoriaMetrics as a Prometheus datasource:
- URL:
https://victoriametrics.vultrlabs.dev(with basic auth) - In-cluster URL:
http://victoriametrics.victoriametrics.svc.cluster.local:8428
Mixed Queries (M3DB + VictoriaMetrics)
Use a Mixed datasource in Grafana to query both:
-
Create two Prometheus datasources:
M3DB→https://m3db.vultrlabs.devVictoriaMetrics→https://victoriametrics.vultrlabs.dev
-
Create a Mixed datasource that includes both
-
In dashboards, use the mixed datasource — Grafana sends the query to both backends and merges results
Alternatively, use dashboard variables to let users toggle between datasources for different time ranges.
Metrics Stored
| Metric | Description |
|---|---|
vllm:prompt_tokens_total |
vLLM prompt token count |
vllm:generation_tokens_total |
vLLM generation token count |
DCGM_FI_DEV_GPU_UTIL |
GPU utilization (DCGM) |
All metrics are tagged with tenant=serverless-inference-cluster and cluster=serverless-inference-cluster.
VictoriaMetrics API Reference
| Endpoint | Purpose |
|---|---|
/api/v1/import |
Import data (Prometheus format) |
/api/v1/export |
Export data |
/api/v1/query |
PromQL instant query |
/api/v1/query_range |
PromQL range query |
| /health | Health check |
| /metrics | Internal metrics |
Storage
- Size: 200Gi NVMe (Vultr Block Storage)
- StorageClass:
vultr-block-storage-vm(Retain policy — data survives PVC deletion) - Retention: 2 years
- Volume expansion:
kubectl edit pvc victoriametrics-data -n victoriametrics
Useful Commands
# Check VM health
kubectl -n victoriametrics exec deploy/victoriametrics -- curl -s http://localhost:8428/health
# Check storage stats
kubectl -n victoriametrics exec deploy/victoriametrics -- \
curl -s 'http://localhost:8428/api/v1/query?query=vm_rows' | python3 -m json.tool
# Query historical data
curl -u vultr_vm:<password> \
"https://victoriametrics.vultrlabs.dev/api/v1/query_range?query=vllm:prompt_tokens_total&start=1773360000&end=1742000000&step=60"
# Restart VM (if needed)
kubectl rollout restart deployment/victoriametrics -n victoriametrics
# Scale to 0 (preserve data, stop the pod)
kubectl scale deployment/victoriametrics --replicas=0 -n victoriametrics
Re-running Backfill
If you need to import additional time ranges or new metrics:
- Edit
backfill.py— updateSTART_TS,END_TS, orMETRICS - Recreate the configmap and pod (see step 4 above)
- VictoriaMetrics is idempotent for imports — duplicate data points are merged, not duplicated
To convert timestamps:
# Date → Unix timestamp
date -u -d '2026-03-13 00:00:00' +%s # 1773360000
# Unix timestamp → date
date -u -d @1773360000