2026-04-09 19:29:18 +00:00
# VictoriaMetrics — Historical Metrics Store
VictoriaMetrics instance for querying historical vLLM + DCGM metrics (March 13, 2026 onward) that couldn't be backfilled into M3DB.
## Why VictoriaMetrics Instead of M3DB?
M3DB doesn't support backfill. Period. See the [main README ](../README.md#why-backfill-doesnt-work ) for the full story.
VictoriaMetrics has a first-class `/api/v1/import` endpoint that accepts data with any timestamp — no `bufferPast` gates, no block size hacks, no special namespaces. You just send the data and it works.
## Architecture
```
┌─────────────────────────────────────────────────┐
│ Vultr VKE Cluster │
│ │
Mimir ──import──▶ VictoriaMetrics (1 pod, 200Gi NVMe) │
│ ↓ PromQL queries │
│ Traefik (TLS + basic auth) │
│ ↓ │
2026-04-09 19:33:58 +00:00
│ victoriametrics.vultrlabs.dev │
2026-04-09 19:29:18 +00:00
└─────────────────────────────────────────────────┘
Grafana queries both:
- M3DB (m3db.vultrlabs.dev) → real-time data (1h blocks, going forward)
2026-04-09 19:33:58 +00:00
- VictoriaMetrics (victoriametrics.vultrlabs.dev) → historical data (Mar 13– present)
2026-04-09 19:29:18 +00:00
```
## Quick Start
### 1. Deploy VictoriaMetrics
```bash
# Apply manifests
kubectl apply -k .
# Wait for pod to be running
kubectl -n victoriametrics get pods -w
# Verify it's healthy
kubectl -n victoriametrics port-forward svc/victoriametrics 8428:8428 &
curl http://localhost:8428/health
```
### 2. Configure DNS
2026-04-09 19:33:58 +00:00
Get the Traefik LoadBalancer IP and point `victoriametrics.vultrlabs.dev` at it:
2026-04-09 19:29:18 +00:00
```bash
kubectl -n traefik get svc traefik
```
### 3. Set Up Basic Auth
Generate htpasswd and update the secret in `04-basic-auth-middleware.yaml` :
```bash
htpasswd -nb vultr_vm <your-password>
# Copy output, base64 encode it:
echo -n '<htpasswd-output>' | base64
# Update the secret and apply
kubectl apply -f 04-basic-auth-middleware.yaml
```
### 4. Run Backfill
```bash
# Create the secret with Mimir credentials
kubectl create secret generic backfill-credentials \
--from-literal=mimir-password='YOUR_MIMIR_PASSWORD' -n victoriametrics
# Upload the backfill script as a configmap
kubectl create configmap backfill-script \
--from-file=backfill.py=backfill.py -n victoriametrics
# Run the backfill pod
kubectl apply -f backfill-pod.yaml
# Watch progress
kubectl logs -f backfill -n victoriametrics
# Cleanup when done
kubectl delete pod backfill -n victoriametrics
kubectl delete configmap backfill-script -n victoriametrics
kubectl delete secret backfill-credentials -n victoriametrics
```
### 5. Verify
```bash
# In-cluster
kubectl -n victoriametrics exec deploy/victoriametrics -- \
curl -s 'http://localhost:8428/api/v1/query?query=vllm:prompt_tokens_total' | python3 -m json.tool
# External (with auth)
2026-04-09 19:33:58 +00:00
curl -u vultr_vm:<password> "https://victoriametrics.vultrlabs.dev/api/v1/query?query=up"
2026-04-09 19:29:18 +00:00
```
## Grafana Configuration
Add VictoriaMetrics as a **Prometheus ** datasource:
2026-04-09 19:33:58 +00:00
- **URL:** `https://victoriametrics.vultrlabs.dev` (with basic auth)
2026-04-09 19:29:18 +00:00
- **In-cluster URL:** `http://victoriametrics.victoriametrics.svc.cluster.local:8428`
### Mixed Queries (M3DB + VictoriaMetrics)
Use a **Mixed ** datasource in Grafana to query both:
1. Create two Prometheus datasources:
- `M3DB` → `https://m3db.vultrlabs.dev`
2026-04-09 19:33:58 +00:00
- `VictoriaMetrics` → `https://victoriametrics.vultrlabs.dev`
2026-04-09 19:29:18 +00:00
2. Create a **Mixed ** datasource that includes both
3. In dashboards, use the mixed datasource — Grafana sends the query to both backends and merges results
Alternatively, use dashboard variables to let users toggle between datasources for different time ranges.
## Metrics Stored
| Metric | Description |
|--------|-------------|
| `vllm:prompt_tokens_total` | vLLM prompt token count |
| `vllm:generation_tokens_total` | vLLM generation token count |
| `DCGM_FI_DEV_GPU_UTIL` | GPU utilization (DCGM) |
All metrics are tagged with `tenant=serverless-inference-cluster` and `cluster=serverless-inference-cluster` .
## VictoriaMetrics API Reference
| Endpoint | Purpose |
|----------|---------|
| `/api/v1/import` | Import data (Prometheus format) |
| `/api/v1/export` | Export data |
| `/api/v1/query` | PromQL instant query |
| `/api/v1/query_range` | PromQL range query |
| /health | Health check |
| /metrics | Internal metrics |
## Storage
- **Size:** 200Gi NVMe (Vultr Block Storage)
- **StorageClass:** `vultr-block-storage-vm` (Retain policy — data survives PVC deletion)
- **Retention:** 2 years
- **Volume expansion:** `kubectl edit pvc victoriametrics-data -n victoriametrics`
## Useful Commands
```bash
# Check VM health
kubectl -n victoriametrics exec deploy/victoriametrics -- curl -s http://localhost:8428/health
# Check storage stats
kubectl -n victoriametrics exec deploy/victoriametrics -- \
curl -s 'http://localhost:8428/api/v1/query?query=vm_rows' | python3 -m json.tool
# Query historical data
curl -u vultr_vm:<password> \
2026-04-09 19:33:58 +00:00
"https://victoriametrics.vultrlabs.dev/api/v1/query_range?query=vllm:prompt_tokens_total&start=1773360000&end=1742000000&step=60"
2026-04-09 19:29:18 +00:00
# Restart VM (if needed)
kubectl rollout restart deployment/victoriametrics -n victoriametrics
# Scale to 0 (preserve data, stop the pod)
kubectl scale deployment/victoriametrics --replicas=0 -n victoriametrics
```
## Re-running Backfill
If you need to import additional time ranges or new metrics:
1. Edit `backfill.py` — update `START_TS` , `END_TS` , or `METRICS`
2. Recreate the configmap and pod (see step 4 above)
3. VictoriaMetrics is idempotent for imports — duplicate data points are merged, not duplicated
To convert timestamps:
```bash
# Date → Unix timestamp
date -u -d '2026-03-13 00:00:00' +%s # 1773360000
# Unix timestamp → date
date -u -d @1773360000
```