4.1 KiB
4.1 KiB
M3DB Backfill Runbook (Revised)
Context
Backfilling ~3 weeks of vLLM + DCGM metrics from Mimir to M3DB.
Blocker discovered: bufferPast is immutable on existing namespaces. Downsample pipeline rejects historical writes.
Solution: Create new backfill namespaces with bufferPast=504h (21 days).
Step 1 — Create Backfill Namespaces
COORD="http://m3coordinator.m3db.svc.cluster.local:7201"
# default_backfill: 7d retention, 21d bufferPast
curl -sSf -X POST "${COORD}/api/v1/services/m3db/namespace" \
-H "Content-Type: application/json" \
-d '{
"name": "default_backfill",
"options": {
"retentionOptions": {
"retentionPeriodDuration": "168h",
"blockSizeDuration": "2h",
"bufferFutureDuration": "10m",
"bufferPastDuration": "504h"
}
}
}'
# agg_10s_backfill: 90d retention, 10s resolution, 21d bufferPast
curl -sSf -X POST "${COORD}/api/v1/services/m3db/namespace" \
-H "Content-Type: application/json" \
-d '{
"name": "agg_10s_backfill",
"options": {
"retentionOptions": {
"retentionPeriodDuration": "2160h",
"blockSizeDuration": "24h",
"bufferFutureDuration": "10m",
"bufferPastDuration": "504h"
}
},
"aggregationOptions": {
"aggregations": [{
"aggregated": true,
"attributes": {
"resolutionNanos": "10000000000",
"downsampleOptions": {"all": true}
}
}]
}
}'
# agg_1m_backfill: 1y retention, 1m resolution, 21d bufferPast
curl -sSf -X POST "${COORD}/api/v1/services/m3db/namespace" \
-H "Content-Type: application/json" \
-d '{
"name": "agg_1m_backfill",
"options": {
"retentionOptions": {
"retentionPeriodDuration": "8760h",
"blockSizeDuration": "24h",
"bufferFutureDuration": "10m",
"bufferPastDuration": "504h"
}
},
"aggregationOptions": {
"aggregations": [{
"aggregated": true,
"attributes": {
"resolutionNanos": "60000000000",
"downsampleOptions": {"all": true}
}
}]
}
}'
Step 2 — Update Coordinator ConfigMap
Add new namespaces to m3coordinator-config:
clusters:
- namespaces:
- namespace: default
type: unaggregated
retention: 168h
- namespace: default_backfill
type: unaggregated
retention: 168h
- namespace: agg_10s_30d
type: aggregated
retention: 2160h
resolution: 10s
- namespace: agg_10s_backfill
type: aggregated
retention: 2160h
resolution: 10s
- namespace: agg_1m_1y
type: aggregated
retention: 8760h
resolution: 1m
- namespace: agg_1m_backfill
type: aggregated
retention: 8760h
resolution: 1m
Also add downsample rules for backfill namespaces.
Step 3 — Restart Coordinators
kubectl rollout restart deployment/m3coordinator -n m3db
kubectl rollout status deployment/m3coordinator -n m3db --timeout=120s
Step 4 — Run Backfill
Write directly to default_backfill namespace using __namespace__ label:
# In the protobuf write request, add label:
# __namespace__ = "default_backfill"
Or use the coordinator endpoint:
POST http://m3coordinator:7201/api/v1/prom/remote/write?namespace=default_backfill
Backfill time range: 2026-03-11T00:00:00Z to 2026-04-01T00:00:00Z
Step 5 — Verify
curl -sS "http://m3coordinator:7201/api/v1/query" \
--data-urlencode 'query=vllm:prompt_tokens_total' \
--data-urlencode 'time=2026-03-20T12:00:00Z'
Step 6 — Revert bufferPast (After Backfill)
# After backfill complete, shrink bufferPast back to 10m
# (Only retentionPeriod is mutable, so this requires namespace recreation)
# OR: Leave as-is since it's a backfill-only namespace
Performance Notes
- M3DB has been fast so far
- New namespaces won't impact existing query performance
- Queries can fan out to both old and new namespaces in parallel
- After backfill, consider consolidating (optional)