# M3DB Backfill Runbook (Revised) ## Context Backfilling ~3 weeks of vLLM + DCGM metrics from Mimir to M3DB. **Blocker discovered:** `bufferPast` is immutable on existing namespaces. Downsample pipeline rejects historical writes. **Solution:** Create new backfill namespaces with `bufferPast=504h` (21 days). --- ## Step 1 — Create Backfill Namespaces ```bash COORD="http://m3coordinator.m3db.svc.cluster.local:7201" # default_backfill: 7d retention, 21d bufferPast curl -sSf -X POST "${COORD}/api/v1/services/m3db/namespace" \ -H "Content-Type: application/json" \ -d '{ "name": "default_backfill", "options": { "retentionOptions": { "retentionPeriodDuration": "168h", "blockSizeDuration": "2h", "bufferFutureDuration": "10m", "bufferPastDuration": "504h" } } }' # agg_10s_backfill: 90d retention, 10s resolution, 21d bufferPast curl -sSf -X POST "${COORD}/api/v1/services/m3db/namespace" \ -H "Content-Type: application/json" \ -d '{ "name": "agg_10s_backfill", "options": { "retentionOptions": { "retentionPeriodDuration": "2160h", "blockSizeDuration": "24h", "bufferFutureDuration": "10m", "bufferPastDuration": "504h" } }, "aggregationOptions": { "aggregations": [{ "aggregated": true, "attributes": { "resolutionNanos": "10000000000", "downsampleOptions": {"all": true} } }] } }' # agg_1m_backfill: 1y retention, 1m resolution, 21d bufferPast curl -sSf -X POST "${COORD}/api/v1/services/m3db/namespace" \ -H "Content-Type: application/json" \ -d '{ "name": "agg_1m_backfill", "options": { "retentionOptions": { "retentionPeriodDuration": "8760h", "blockSizeDuration": "24h", "bufferFutureDuration": "10m", "bufferPastDuration": "504h" } }, "aggregationOptions": { "aggregations": [{ "aggregated": true, "attributes": { "resolutionNanos": "60000000000", "downsampleOptions": {"all": true} } }] } }' ``` --- ## Step 2 — Update Coordinator ConfigMap Add new namespaces to `m3coordinator-config`: ```yaml clusters: - namespaces: - namespace: default type: unaggregated retention: 168h - namespace: default_backfill type: unaggregated retention: 168h - namespace: agg_10s_30d type: aggregated retention: 2160h resolution: 10s - namespace: agg_10s_backfill type: aggregated retention: 2160h resolution: 10s - namespace: agg_1m_1y type: aggregated retention: 8760h resolution: 1m - namespace: agg_1m_backfill type: aggregated retention: 8760h resolution: 1m ``` Also add downsample rules for backfill namespaces. --- ## Step 3 — Restart Coordinators ```bash kubectl rollout restart deployment/m3coordinator -n m3db kubectl rollout status deployment/m3coordinator -n m3db --timeout=120s ``` --- ## Step 4 — Run Backfill Write directly to `default_backfill` namespace using `__namespace__` label: ```python # In the protobuf write request, add label: # __namespace__ = "default_backfill" ``` Or use the coordinator endpoint: ``` POST http://m3coordinator:7201/api/v1/prom/remote/write?namespace=default_backfill ``` Backfill time range: `2026-03-11T00:00:00Z` to `2026-04-01T00:00:00Z` --- ## Step 5 — Verify ```bash curl -sS "http://m3coordinator:7201/api/v1/query" \ --data-urlencode 'query=vllm:prompt_tokens_total' \ --data-urlencode 'time=2026-03-20T12:00:00Z' ``` --- ## Step 6 — Revert bufferPast (After Backfill) ```bash # After backfill complete, shrink bufferPast back to 10m # (Only retentionPeriod is mutable, so this requires namespace recreation) # OR: Leave as-is since it's a backfill-only namespace ``` --- ## Performance Notes - M3DB has been fast so far - New namespaces won't impact existing query performance - Queries can fan out to both old and new namespaces in parallel - After backfill, consider consolidating (optional)