Clean slate: 1h block sizes, remove backfill artifacts
- Changed all namespace block sizes to 1h (was 2h/12h/24h in manifests, 30d+ in the live cluster due to backfill-era bufferPast hacks) - Deleted entire backfill/ directory (scripts, pods, runbooks) - Removed stale 05-m3coordinator.yaml (had backfill namespaces) - Added 05-m3coordinator-deployment.yaml to kustomization - Fixed init job health check (/health instead of /api/v1/services/m3db/health) - Updated .env.example (removed Mimir credentials) - Added 'Why Backfill Doesn't Work' section to README
This commit is contained in:
52
README.md
52
README.md
@@ -44,11 +44,13 @@ Internet → Vultr LoadBalancer → Traefik (TLS + basic auth) → m3coordinator
|
||||
|
||||
## Retention Tiers
|
||||
|
||||
| Namespace | Resolution | Retention | Use Case |
|
||||
|----------------|-----------|-----------|---------------------------|
|
||||
| `default` | raw | 48h | Real-time queries |
|
||||
| `agg_10s_30d` | 10s | 30 days | Recent dashboards |
|
||||
| `agg_1m_1y` | 1m | 1 year | Long-term trends/capacity |
|
||||
All namespaces use **1h block size** — the sweet spot for M3DB. Smaller blocks mean faster queries, faster flushes, and less memory pressure during compaction. See [Why Backfill Doesn't Work](#why-backfill-doesnt-work) for why larger blocks were a disaster.
|
||||
|
||||
| Namespace | Resolution | Retention | Block Size | Use Case |
|
||||
|----------------|-----------|-----------|------------|---------------------------|
|
||||
| `default` | raw | 48h | 1h | Real-time queries |
|
||||
| `agg_10s_30d` | 10s | 30 days | 1h | Recent dashboards |
|
||||
| `agg_1m_1y` | 1m | 1 year | 1h | Long-term trends/capacity |
|
||||
|
||||
## Deployment
|
||||
|
||||
@@ -96,13 +98,13 @@ kubectl -n m3db exec m3dbnode-0 -- curl -s -X POST http://localhost:7201/api/v1/
|
||||
|
||||
# Create namespaces
|
||||
kubectl -n m3db exec m3dbnode-0 -- curl -s -X POST http://localhost:7201/api/v1/services/m3db/namespace \
|
||||
-H "Content-Type: application/json" -d '{"name":"default","options":{"bootstrapEnabled":true,"flushEnabled":true,"writesToCommitLog":true,"cleanupEnabled":true,"snapshotEnabled":true,"repairEnabled":false,"retentionOptions":{"retentionPeriodDuration":"48h","blockSizeDuration":"2h","bufferFutureDuration":"10m","bufferPastDuration":"10m"},"indexOptions":{"enabled":true,"blockSizeDuration":"2h"}}}'
|
||||
-H "Content-Type: application/json" -d '{"name":"default","options":{"bootstrapEnabled":true,"flushEnabled":true,"writesToCommitLog":true,"cleanupEnabled":true,"snapshotEnabled":true,"repairEnabled":false,"retentionOptions":{"retentionPeriodDuration":"48h","blockSizeDuration":"1h","bufferFutureDuration":"10m","bufferPastDuration":"10m"},"indexOptions":{"enabled":true,"blockSizeDuration":"1h"}}}'
|
||||
|
||||
kubectl -n m3db exec m3dbnode-0 -- curl -s -X POST http://localhost:7201/api/v1/services/m3db/namespace \
|
||||
-H "Content-Type: application/json" -d '{"name":"agg_10s_30d","options":{"bootstrapEnabled":true,"flushEnabled":true,"writesToCommitLog":true,"cleanupEnabled":true,"snapshotEnabled":true,"retentionOptions":{"retentionPeriodDuration":"720h","blockSizeDuration":"12h","bufferFutureDuration":"10m","bufferPastDuration":"10m"},"indexOptions":{"enabled":true,"blockSizeDuration":"12h"},"aggregationOptions":{"aggregations":[{"aggregated":true,"attributes":{"resolutionDuration":"10s"}}]}}}'
|
||||
-H "Content-Type: application/json" -d '{"name":"agg_10s_30d","options":{"bootstrapEnabled":true,"flushEnabled":true,"writesToCommitLog":true,"cleanupEnabled":true,"snapshotEnabled":true,"retentionOptions":{"retentionPeriodDuration":"720h","blockSizeDuration":"1h","bufferFutureDuration":"10m","bufferPastDuration":"10m"},"indexOptions":{"enabled":true,"blockSizeDuration":"1h"},"aggregationOptions":{"aggregations":[{"aggregated":true,"attributes":{"resolutionDuration":"10s"}}]}}}'
|
||||
|
||||
kubectl -n m3db exec m3dbnode-0 -- curl -s -X POST http://localhost:7201/api/v1/services/m3db/namespace \
|
||||
-H "Content-Type: application/json" -d '{"name":"agg_1m_1y","options":{"bootstrapEnabled":true,"flushEnabled":true,"writesToCommitLog":true,"cleanupEnabled":true,"snapshotEnabled":true,"retentionOptions":{"retentionPeriodDuration":"8760h","blockSizeDuration":"24h","bufferFutureDuration":"10m","bufferPastDuration":"10m"},"indexOptions":{"enabled":true,"blockSizeDuration":"24h"},"aggregationOptions":{"aggregations":[{"aggregated":true,"attributes":{"resolutionDuration":"1m"}}]}}}'
|
||||
-H "Content-Type: application/json" -d '{"name":"agg_1m_1y","options":{"bootstrapEnabled":true,"flushEnabled":true,"writesToCommitLog":true,"cleanupEnabled":true,"snapshotEnabled":true,"retentionOptions":{"retentionPeriodDuration":"8760h","blockSizeDuration":"1h","bufferFutureDuration":"10m","bufferPastDuration":"10m"},"indexOptions":{"enabled":true,"blockSizeDuration":"1h"},"aggregationOptions":{"aggregations":[{"aggregated":true,"attributes":{"resolutionDuration":"1m"}}]}}}'
|
||||
|
||||
# Wait for bootstrapping to complete (check shard state = AVAILABLE)
|
||||
kubectl -n m3db exec m3dbnode-0 -- curl -s http://localhost:9002/health
|
||||
@@ -250,6 +252,40 @@ remote_write:
|
||||
- **Shards**: The init job creates 64 shards across 3 nodes. For higher cardinality, increase to 128 or 256.
|
||||
- **Volume expansion**: The StorageClass has `allowVolumeExpansion: true` — you can resize PVCs online via `kubectl edit pvc`.
|
||||
|
||||
## Why Backfill Doesn't Work
|
||||
|
||||
**TL;DR: M3DB is not designed for historical data import. Don't try it.**
|
||||
|
||||
M3DB is a time-series database optimized for real-time ingestion and sequential writes. Backfilling — writing data with timestamps in the past — fights the fundamental architecture at every turn:
|
||||
|
||||
### The Problems
|
||||
|
||||
1. **`bufferPast` is a hard gate.** M3DB rejects writes whose timestamps fall outside the `bufferPast` window (default: 10m). To write data from 3 weeks ago, you need `bufferPast=504h` (21 days). This setting is **immutable** on existing namespaces — you have to create entirely new namespaces just for backfill, doubling your operational complexity.
|
||||
|
||||
2. **Massive block sizes were required.** To make the backfill namespaces work with `bufferPast=504h`, block sizes had to be enormous (30+ day blocks). This defeated the entire point of M3DB's time-partitioned storage — blocks that large cause extreme memory pressure, slow compaction, and bloated index lookups.
|
||||
|
||||
3. **Downsample pipeline ignores historical data.** M3DB's downsample coordinator only processes new writes in real-time. Backfilled data written to `default_backfill` namespaces never gets downsampled into aggregated namespaces, so your long-term retention tiers have gaps.
|
||||
|
||||
4. **No transaction boundaries.** Each backfill write is an individual operation. Writing 12M+ samples means 12M+ individual writes with no batching semantics. If one fails, there's no rollback, no retry from a checkpoint — you get partial data with no easy way to detect or fix gaps.
|
||||
|
||||
5. **Compaction and flush chaos.** M3DB expects data to flow sequentially through commitlog → flush → compact. Backfill dumps data out of order, causing the background compaction to thrash, consuming CPU and I/O for blocks that may never be queried again.
|
||||
|
||||
### What We Tried
|
||||
|
||||
- Created `default_backfill`, `agg_10s_backfill`, `agg_1m_backfill` namespaces with `bufferPast=504h`
|
||||
- Increased block sizes to 24h–30d to accommodate the large bufferPast
|
||||
- Wrote 12M+ samples from Mimir to M3DB over multiple runs
|
||||
- Result: Data landed, but the operational cost was catastrophic — huge blocks, no downsampling, and the cluster was unstable
|
||||
|
||||
### What To Do Instead
|
||||
|
||||
- **Start fresh.** Configure M3DB with sane block sizes (1h) from day one and let it accumulate data naturally via Prometheus remote_write.
|
||||
- **Accept the gap.** Historical data lives in Mimir (or wherever it was before). Query Mimir for old data, M3DB for new data.
|
||||
- **Dual-write during migration.** Write to both systems simultaneously until M3DB's retention catches up.
|
||||
- **If you absolutely need old data in M3DB**, accept that you're doing a one-time migration and build tooling around the constraints — but know that it's a project, not a script.
|
||||
|
||||
---
|
||||
|
||||
## Useful Commands
|
||||
|
||||
```bash
|
||||
|
||||
Reference in New Issue
Block a user