Files

VictoriaMetrics — Historical Metrics Store

VictoriaMetrics instance for querying historical vLLM + DCGM metrics (March 13, 2026 onward) that couldn't be backfilled into M3DB.

Why VictoriaMetrics Instead of M3DB?

M3DB doesn't support backfill. Period. See the main README for the full story.

VictoriaMetrics has a first-class /api/v1/import endpoint that accepts data with any timestamp — no bufferPast gates, no block size hacks, no special namespaces. You just send the data and it works.

Architecture

                 ┌─────────────────────────────────────────────────┐
                 │               Vultr VKE Cluster                 │
                 │                                                 │
Mimir ──import──▶ VictoriaMetrics (1 pod, 200Gi NVMe)            │
                 │   ↓ PromQL queries                              │
                 │   Traefik (TLS + basic auth)                    │
                 │   ↓                                             │
                 │   victoriametrics.vultrlabs.dev                              │
                 └─────────────────────────────────────────────────┘

Grafana queries both:
  - M3DB (m3db.vultrlabs.dev) → real-time data (1h blocks, going forward)
  - VictoriaMetrics (victoriametrics.vultrlabs.dev) → historical data (Mar 13present)

Quick Start

1. Deploy VictoriaMetrics

# Apply manifests
kubectl apply -k .

# Wait for pod to be running
kubectl -n victoriametrics get pods -w

# Verify it's healthy
kubectl -n victoriametrics port-forward svc/victoriametrics 8428:8428 &
curl http://localhost:8428/health

2. Configure DNS

Get the Traefik LoadBalancer IP and point victoriametrics.vultrlabs.dev at it:

kubectl -n traefik get svc traefik

3. Set Up Basic Auth

Generate htpasswd and update the secret in 04-basic-auth-middleware.yaml:

htpasswd -nb vultr_vm <your-password>
# Copy output, base64 encode it:
echo -n '<htpasswd-output>' | base64
# Update the secret and apply
kubectl apply -f 04-basic-auth-middleware.yaml

4. Run Backfill

# Create the secret with Mimir credentials
kubectl create secret generic backfill-credentials \
  --from-literal=mimir-password='YOUR_MIMIR_PASSWORD' -n victoriametrics

# Upload the backfill script as a configmap
kubectl create configmap backfill-script \
  --from-file=backfill.py=backfill.py -n victoriametrics

# Run the backfill pod
kubectl apply -f backfill-pod.yaml

# Watch progress
kubectl logs -f backfill -n victoriametrics

# Cleanup when done
kubectl delete pod backfill -n victoriametrics
kubectl delete configmap backfill-script -n victoriametrics
kubectl delete secret backfill-credentials -n victoriametrics

5. Verify

# In-cluster
kubectl -n victoriametrics exec deploy/victoriametrics -- \
  curl -s 'http://localhost:8428/api/v1/query?query=vllm:prompt_tokens_total' | python3 -m json.tool

# External (with auth)
curl -u vultr_vm:<password> "https://victoriametrics.vultrlabs.dev/api/v1/query?query=up"

Grafana Configuration

Add VictoriaMetrics as a Prometheus datasource:

  • URL: https://victoriametrics.vultrlabs.dev (with basic auth)
  • In-cluster URL: http://victoriametrics.victoriametrics.svc.cluster.local:8428

Mixed Queries (M3DB + VictoriaMetrics)

Use a Mixed datasource in Grafana to query both:

  1. Create two Prometheus datasources:

    • M3DBhttps://m3db.vultrlabs.dev
    • VictoriaMetricshttps://victoriametrics.vultrlabs.dev
  2. Create a Mixed datasource that includes both

  3. In dashboards, use the mixed datasource — Grafana sends the query to both backends and merges results

Alternatively, use dashboard variables to let users toggle between datasources for different time ranges.

Metrics Stored

Metric Description
vllm:prompt_tokens_total vLLM prompt token count
vllm:generation_tokens_total vLLM generation token count
DCGM_FI_DEV_GPU_UTIL GPU utilization (DCGM)

All metrics are tagged with tenant=serverless-inference-cluster and cluster=serverless-inference-cluster.

VictoriaMetrics API Reference

Endpoint Purpose
/api/v1/import Import data (Prometheus format)
/api/v1/export Export data
/api/v1/query PromQL instant query
/api/v1/query_range PromQL range query
/health Health check
/metrics Internal metrics

Storage

  • Size: 200Gi NVMe (Vultr Block Storage)
  • StorageClass: vultr-block-storage-vm (Retain policy — data survives PVC deletion)
  • Retention: 2 years
  • Volume expansion: kubectl edit pvc victoriametrics-data -n victoriametrics

Useful Commands

# Check VM health
kubectl -n victoriametrics exec deploy/victoriametrics -- curl -s http://localhost:8428/health

# Check storage stats
kubectl -n victoriametrics exec deploy/victoriametrics -- \
  curl -s 'http://localhost:8428/api/v1/query?query=vm_rows' | python3 -m json.tool

# Query historical data
curl -u vultr_vm:<password> \
  "https://victoriametrics.vultrlabs.dev/api/v1/query_range?query=vllm:prompt_tokens_total&start=1773360000&end=1742000000&step=60"

# Restart VM (if needed)
kubectl rollout restart deployment/victoriametrics -n victoriametrics

# Scale to 0 (preserve data, stop the pod)
kubectl scale deployment/victoriametrics --replicas=0 -n victoriametrics

Re-running Backfill

If you need to import additional time ranges or new metrics:

  1. Edit backfill.py — update START_TS, END_TS, or METRICS
  2. Recreate the configmap and pod (see step 4 above)
  3. VictoriaMetrics is idempotent for imports — duplicate data points are merged, not duplicated

To convert timestamps:

# Date → Unix timestamp
date -u -d '2026-03-13 00:00:00' +%s    # 1773360000

# Unix timestamp → date
date -u -d @1773360000