# M3DB on Vultr Kubernetes Engine Drop-in Mimir replacement using M3DB for long-term Prometheus metrics storage, deployed on Vultr VKE with Vultr Block Storage CSI. ## Architecture ``` ┌─────────────────────────────────────────────────────┐ │ Vultr VKE Cluster │ │ │ External Prometheus ─┼──remote_write──▶ Vultr LoadBalancer (m3coordinator-lb) External Grafana ─┼──PromQL query──▶ │ (managed, provisioned by CCM) │ │ In-cluster Prometheus┼──remote_write──▶ M3 Coordinator (Deployment, 2 replicas) In-cluster Grafana ┼──PromQL query──▶ │ │ │ │ ┌───────┴───────┐ │ │ M3DB Nodes │ (StatefulSet, 3 replicas) │ │ Vultr Block │ (100Gi NVMe per node) │ │ Storage │ │ └───────┬───────┘ │ │ │ etcd cluster (StatefulSet, 3 replicas) └─────────────────────────────────────────────────────┘ ``` ## Retention Tiers | Namespace | Resolution | Retention | Use Case | |----------------|-----------|-----------|---------------------------| | `default` | raw | 48h | Real-time queries | | `agg_10s_30d` | 10s | 30 days | Recent dashboards | | `agg_1m_1y` | 1m | 1 year | Long-term trends/capacity | ## Deployment ```bash # 1. Apply everything kubectl apply -k . # 2. Wait for all pods to be Running kubectl -n m3db get pods -w # 3. Bootstrap the cluster (placement + namespaces) # The init job waits for coordinator health, which requires m3db to be bootstrapped. # Bootstrap directly via m3dbnode's embedded coordinator: kubectl -n m3db exec m3dbnode-0 -- curl -s -X POST http://localhost:7201/api/v1/services/m3db/placement/init \ -H "Content-Type: application/json" -d '{ "num_shards": 64, "replication_factor": 3, "instances": [ {"id": "m3dbnode-0", "isolation_group": "zone-a", "zone": "embedded", "weight": 100, "endpoint": "m3dbnode-0.m3dbnode.m3db.svc.cluster.local:9000", "hostname": "m3dbnode-0", "port": 9000}, {"id": "m3dbnode-1", "isolation_group": "zone-b", "zone": "embedded", "weight": 100, "endpoint": "m3dbnode-1.m3dbnode.m3db.svc.cluster.local:9000", "hostname": "m3dbnode-1", "port": 9000}, {"id": "m3dbnode-2", "isolation_group": "zone-c", "zone": "embedded", "weight": 100, "endpoint": "m3dbnode-2.m3dbnode.m3db.svc.cluster.local:9000", "hostname": "m3dbnode-2", "port": 9000} ] }' kubectl -n m3db exec m3dbnode-0 -- curl -s -X POST http://localhost:7201/api/v1/services/m3db/namespace \ -H "Content-Type: application/json" -d '{"name":"default","options":{"bootstrapEnabled":true,"flushEnabled":true,"writesToCommitLog":true,"cleanupEnabled":true,"snapshotEnabled":true,"repairEnabled":false,"retentionOptions":{"retentionPeriodDuration":"48h","blockSizeDuration":"2h","bufferFutureDuration":"10m","bufferPastDuration":"10m"},"indexOptions":{"enabled":true,"blockSizeDuration":"2h"}}}' kubectl -n m3db exec m3dbnode-0 -- curl -s -X POST http://localhost:7201/api/v1/services/m3db/namespace \ -H "Content-Type: application/json" -d '{"name":"agg_10s_30d","options":{"bootstrapEnabled":true,"flushEnabled":true,"writesToCommitLog":true,"cleanupEnabled":true,"snapshotEnabled":true,"retentionOptions":{"retentionPeriodDuration":"720h","blockSizeDuration":"12h","bufferFutureDuration":"10m","bufferPastDuration":"10m"},"indexOptions":{"enabled":true,"blockSizeDuration":"12h"},"aggregationOptions":{"aggregations":[{"aggregated":true,"attributes":{"resolutionDuration":"10s"}}]}}}' kubectl -n m3db exec m3dbnode-0 -- curl -s -X POST http://localhost:7201/api/v1/services/m3db/namespace \ -H "Content-Type: application/json" -d '{"name":"agg_1m_1y","options":{"bootstrapEnabled":true,"flushEnabled":true,"writesToCommitLog":true,"cleanupEnabled":true,"snapshotEnabled":true,"retentionOptions":{"retentionPeriodDuration":"8760h","blockSizeDuration":"24h","bufferFutureDuration":"10m","bufferPastDuration":"10m"},"indexOptions":{"enabled":true,"blockSizeDuration":"24h"},"aggregationOptions":{"aggregations":[{"aggregated":true,"attributes":{"resolutionDuration":"1m"}}]}}}' # 4. Wait for bootstrapping to complete (check shard state = AVAILABLE) kubectl -n m3db exec m3dbnode-0 -- curl -s http://localhost:9002/health # 5. Get the LoadBalancer IP kubectl -n m3db get svc m3coordinator-lb ``` ## Testing **Quick connectivity test:** ```bash ./test-metrics.sh # Example: ./test-metrics.sh http://m3db.vultrlabs.dev:7201 ``` This script verifies: 1. Coordinator health endpoint responds 2. Placement is configured with all 3 m3dbnode instances 3. All 3 namespaces are created (default, agg_10s_30d, agg_1m_1y) 4. PromQL queries work **Full read/write test (Python):** ```bash pip install requests python-snappy python3 test-metrics.py # Example: python3 test-metrics.py http://m3db.vultrlabs.dev:7201 ``` Writes a test metric via Prometheus remote_write and reads it back. ## Prometheus Configuration (Replacing Mimir) Update your Prometheus config to point at M3 Coordinator. **In-cluster (same VKE cluster):** ```yaml # prometheus.yml remote_write: - url: "http://m3coordinator.m3db.svc.cluster.local:7201/api/v1/prom/remote/write" queue_config: capacity: 10000 max_shards: 30 max_samples_per_send: 5000 batch_send_deadline: 5s remote_read: - url: "http://m3coordinator.m3db.svc.cluster.local:7201/api/v1/prom/remote/read" read_recent: true ``` **External (cross-region/cross-cluster):** ```yaml # prometheus.yml remote_write: - url: "http://m3db.vultrlabs.dev:7201/api/v1/prom/remote/write" queue_config: capacity: 10000 max_shards: 30 max_samples_per_send: 5000 batch_send_deadline: 5s remote_read: - url: "http://m3db.vultrlabs.dev:7201/api/v1/prom/remote/read" read_recent: true ``` Get the LoadBalancer IP: ```bash kubectl -n m3db get svc m3coordinator-lb ``` ## Grafana Datasource Add a **Prometheus** datasource in Grafana pointing to: - **In-cluster:** `http://m3coordinator.m3db.svc.cluster.local:7201` - **External:** `http://m3db.vultrlabs.dev:7201` All existing PromQL dashboards will work without modification. ## Migration from Mimir 1. **Dual-write phase**: Configure Prometheus to remote_write to both Mimir and M3DB simultaneously. 2. **Validation**: Compare query results between Mimir and M3DB for the same time ranges. 3. **Cutover**: Once retention in M3DB covers your needs, remove the Mimir remote_write target. 4. **Cleanup**: Decommission Mimir components. ## Tuning for Vultr - **Storage**: The `vultr-block-storage-m3db` StorageClass uses `disk_type: nvme` (NVMe SSD). Adjust `storage` in the VolumeClaimTemplates based on your cardinality and retention. - **Node sizing**: M3DB is memory-hungry. Recommend at least 8GB RAM nodes on Vultr. The manifest requests 4Gi per m3dbnode pod. - **Shards**: The init job creates 64 shards across 3 nodes. For higher cardinality, increase to 128 or 256. - **Volume expansion**: The StorageClass has `allowVolumeExpansion: true` — you can resize PVCs online via `kubectl edit pvc`. ## Useful Commands ```bash # Get LoadBalancer IP kubectl -n m3db get svc m3coordinator-lb # Check cluster health (from inside cluster) kubectl -n m3db exec m3dbnode-0 -- curl -s http://m3coordinator.m3db.svc.cluster.local:7201/health # Check placement (from inside cluster) kubectl -n m3db exec m3dbnode-0 -- curl -s http://m3coordinator.m3db.svc.cluster.local:7201/api/v1/services/m3db/placement | jq # Check m3dbnode bootstrapped status kubectl -n m3db exec m3dbnode-0 -- curl -s http://localhost:9002/health # Query via PromQL (external) curl "http://:7201/api/v1/query?query=up" # Delete the init job to re-run (if needed) kubectl -n m3db delete job m3db-cluster-init kubectl apply -f 06-init-and-pdb.yaml ```