Clean slate: 1h block sizes, remove backfill artifacts

- Changed all namespace block sizes to 1h (was 2h/12h/24h in manifests, 30d+ in the live cluster due to backfill-era bufferPast hacks) - Deleted entire backfill/ directory (scripts, pods, runbooks) - Removed stale 05-m3coordinator.yaml (had backfill namespaces) - Added 05-m3coordinator-deployment.yaml to kustomization - Fixed init job health check (/health instead of /api/v1/services/m3db/health) - Updated .env.example (removed Mimir credentials) - Added 'Why Backfill Doesn't Work' section to README
2026-04-09 19:00:08 +00:00
parent 1af29e8f09
commit 7ade5ecac8
16 changed files with 115 additions and 1117 deletions
--- a/README.md
+++ b/README.md
@@ -44,11 +44,13 @@ Internet → Vultr LoadBalancer → Traefik (TLS + basic auth) → m3coordinator

 ## Retention Tiers

-| Namespace      | Resolution | Retention | Use Case                  |
-|----------------|-----------|-----------|---------------------------|
-| `default`      | raw       | 48h       | Real-time queries         |
-| `agg_10s_30d`  | 10s       | 30 days   | Recent dashboards         |
-| `agg_1m_1y`    | 1m        | 1 year    | Long-term trends/capacity |
+All namespaces use **1h block size** — the sweet spot for M3DB. Smaller blocks mean faster queries, faster flushes, and less memory pressure during compaction. See [Why Backfill Doesn't Work](#why-backfill-doesnt-work) for why larger blocks were a disaster.
+
+| Namespace      | Resolution | Retention | Block Size | Use Case                  |
+|----------------|-----------|-----------|------------|---------------------------|
+| `default`      | raw       | 48h       | 1h         | Real-time queries         |
+| `agg_10s_30d`  | 10s       | 30 days   | 1h         | Recent dashboards         |
+| `agg_1m_1y`    | 1m        | 1 year    | 1h         | Long-term trends/capacity |

 ## Deployment

@@ -96,13 +98,13 @@ kubectl -n m3db exec m3dbnode-0 -- curl -s -X POST http://localhost:7201/api/v1/

 # Create namespaces
 kubectl -n m3db exec m3dbnode-0 -- curl -s -X POST http://localhost:7201/api/v1/services/m3db/namespace \
-  -H "Content-Type: application/json" -d '{"name":"default","options":{"bootstrapEnabled":true,"flushEnabled":true,"writesToCommitLog":true,"cleanupEnabled":true,"snapshotEnabled":true,"repairEnabled":false,"retentionOptions":{"retentionPeriodDuration":"48h","blockSizeDuration":"2h","bufferFutureDuration":"10m","bufferPastDuration":"10m"},"indexOptions":{"enabled":true,"blockSizeDuration":"2h"}}}'
+  -H "Content-Type: application/json" -d '{"name":"default","options":{"bootstrapEnabled":true,"flushEnabled":true,"writesToCommitLog":true,"cleanupEnabled":true,"snapshotEnabled":true,"repairEnabled":false,"retentionOptions":{"retentionPeriodDuration":"48h","blockSizeDuration":"1h","bufferFutureDuration":"10m","bufferPastDuration":"10m"},"indexOptions":{"enabled":true,"blockSizeDuration":"1h"}}}'

 kubectl -n m3db exec m3dbnode-0 -- curl -s -X POST http://localhost:7201/api/v1/services/m3db/namespace \
-  -H "Content-Type: application/json" -d '{"name":"agg_10s_30d","options":{"bootstrapEnabled":true,"flushEnabled":true,"writesToCommitLog":true,"cleanupEnabled":true,"snapshotEnabled":true,"retentionOptions":{"retentionPeriodDuration":"720h","blockSizeDuration":"12h","bufferFutureDuration":"10m","bufferPastDuration":"10m"},"indexOptions":{"enabled":true,"blockSizeDuration":"12h"},"aggregationOptions":{"aggregations":[{"aggregated":true,"attributes":{"resolutionDuration":"10s"}}]}}}'
+  -H "Content-Type: application/json" -d '{"name":"agg_10s_30d","options":{"bootstrapEnabled":true,"flushEnabled":true,"writesToCommitLog":true,"cleanupEnabled":true,"snapshotEnabled":true,"retentionOptions":{"retentionPeriodDuration":"720h","blockSizeDuration":"1h","bufferFutureDuration":"10m","bufferPastDuration":"10m"},"indexOptions":{"enabled":true,"blockSizeDuration":"1h"},"aggregationOptions":{"aggregations":[{"aggregated":true,"attributes":{"resolutionDuration":"10s"}}]}}}'

 kubectl -n m3db exec m3dbnode-0 -- curl -s -X POST http://localhost:7201/api/v1/services/m3db/namespace \
-  -H "Content-Type: application/json" -d '{"name":"agg_1m_1y","options":{"bootstrapEnabled":true,"flushEnabled":true,"writesToCommitLog":true,"cleanupEnabled":true,"snapshotEnabled":true,"retentionOptions":{"retentionPeriodDuration":"8760h","blockSizeDuration":"24h","bufferFutureDuration":"10m","bufferPastDuration":"10m"},"indexOptions":{"enabled":true,"blockSizeDuration":"24h"},"aggregationOptions":{"aggregations":[{"aggregated":true,"attributes":{"resolutionDuration":"1m"}}]}}}'
+  -H "Content-Type: application/json" -d '{"name":"agg_1m_1y","options":{"bootstrapEnabled":true,"flushEnabled":true,"writesToCommitLog":true,"cleanupEnabled":true,"snapshotEnabled":true,"retentionOptions":{"retentionPeriodDuration":"8760h","blockSizeDuration":"1h","bufferFutureDuration":"10m","bufferPastDuration":"10m"},"indexOptions":{"enabled":true,"blockSizeDuration":"1h"},"aggregationOptions":{"aggregations":[{"aggregated":true,"attributes":{"resolutionDuration":"1m"}}]}}}'

 # Wait for bootstrapping to complete (check shard state = AVAILABLE)
 kubectl -n m3db exec m3dbnode-0 -- curl -s http://localhost:9002/health
@@ -250,6 +252,40 @@ remote_write:
 - **Shards**: The init job creates 64 shards across 3 nodes. For higher cardinality, increase to 128 or 256.
 - **Volume expansion**: The StorageClass has `allowVolumeExpansion: true` — you can resize PVCs online via `kubectl edit pvc`.

+## Why Backfill Doesn't Work
+
+**TL;DR: M3DB is not designed for historical data import. Don't try it.**
+
+M3DB is a time-series database optimized for real-time ingestion and sequential writes. Backfilling — writing data with timestamps in the past — fights the fundamental architecture at every turn:
+
+### The Problems
+
+1. **`bufferPast` is a hard gate.** M3DB rejects writes whose timestamps fall outside the `bufferPast` window (default: 10m). To write data from 3 weeks ago, you need `bufferPast=504h` (21 days). This setting is **immutable** on existing namespaces — you have to create entirely new namespaces just for backfill, doubling your operational complexity.
+
+2. **Massive block sizes were required.** To make the backfill namespaces work with `bufferPast=504h`, block sizes had to be enormous (30+ day blocks). This defeated the entire point of M3DB's time-partitioned storage — blocks that large cause extreme memory pressure, slow compaction, and bloated index lookups.
+
+3. **Downsample pipeline ignores historical data.** M3DB's downsample coordinator only processes new writes in real-time. Backfilled data written to `default_backfill` namespaces never gets downsampled into aggregated namespaces, so your long-term retention tiers have gaps.
+
+4. **No transaction boundaries.** Each backfill write is an individual operation. Writing 12M+ samples means 12M+ individual writes with no batching semantics. If one fails, there's no rollback, no retry from a checkpoint — you get partial data with no easy way to detect or fix gaps.
+
+5. **Compaction and flush chaos.** M3DB expects data to flow sequentially through commitlog → flush → compact. Backfill dumps data out of order, causing the background compaction to thrash, consuming CPU and I/O for blocks that may never be queried again.
+
+### What We Tried
+
+- Created `default_backfill`, `agg_10s_backfill`, `agg_1m_backfill` namespaces with `bufferPast=504h`
+- Increased block sizes to 24h–30d to accommodate the large bufferPast
+- Wrote 12M+ samples from Mimir to M3DB over multiple runs
+- Result: Data landed, but the operational cost was catastrophic — huge blocks, no downsampling, and the cluster was unstable
+
+### What To Do Instead
+
+- **Start fresh.** Configure M3DB with sane block sizes (1h) from day one and let it accumulate data naturally via Prometheus remote_write.
+- **Accept the gap.** Historical data lives in Mimir (or wherever it was before). Query Mimir for old data, M3DB for new data.
+- **Dual-write during migration.** Write to both systems simultaneously until M3DB's retention catches up.
+- **If you absolutely need old data in M3DB**, accept that you're doing a one-time migration and build tooling around the constraints — but know that it's a project, not a script.
+
+---
+
 ## Useful Commands

 ```bash