Files
m3db-vke-setup/README.md
2026-03-31 08:28:16 -04:00

112 lines
3.9 KiB
Markdown

# M3DB on Vultr Kubernetes Engine
Drop-in Mimir replacement using M3DB for long-term Prometheus metrics storage, deployed on Vultr VKE with Vultr Block Storage CSI.
## Architecture
```
Prometheus ──remote_write──▶ M3 Coordinator (Deployment, 2 replicas)
Grafana ──PromQL query──▶ │
┌───────┴───────┐
│ M3DB Nodes │ (StatefulSet, 3 replicas)
│ Vultr Block │ (100Gi SSD per node)
│ Storage │
└───────┬───────┘
etcd cluster (StatefulSet, 3 replicas)
```
## Retention Tiers
| Namespace | Resolution | Retention | Use Case |
|----------------|-----------|-----------|---------------------------|
| `default` | raw | 48h | Real-time queries |
| `agg_10s_30d` | 10s | 30 days | Recent dashboards |
| `agg_1m_1y` | 1m | 1 year | Long-term trends/capacity |
## Deployment
```bash
# 1. Apply everything (except the init job won't succeed until pods are up)
kubectl apply -k .
# 2. Wait for all pods to be Ready
kubectl -n m3db get pods -w
# 3. Once all m3dbnode and m3coordinator pods are Running, the init job
# will bootstrap the cluster (placement + namespaces).
# Monitor it:
kubectl -n m3db logs -f job/m3db-cluster-init
# 4. Verify cluster health
kubectl -n m3db port-forward svc/m3coordinator 7201:7201
curl http://localhost:7201/api/v1/services/m3db/placement
curl http://localhost:7201/api/v1/services/m3db/namespace
```
## Prometheus Configuration (Replacing Mimir)
Update your Prometheus config to point at M3 Coordinator instead of Mimir:
```yaml
# prometheus.yml
remote_write:
- url: "http://m3coordinator.m3db.svc.cluster.local:7201/api/v1/prom/remote/write"
queue_config:
capacity: 10000
max_shards: 30
max_samples_per_send: 5000
batch_send_deadline: 5s
remote_read:
- url: "http://m3coordinator.m3db.svc.cluster.local:7201/api/v1/prom/remote/read"
read_recent: true
```
## Grafana Datasource
Add a **Prometheus** datasource in Grafana pointing to:
```
http://m3coordinator.m3db.svc.cluster.local:7201
```
All existing PromQL dashboards will work without modification.
## Migration from Mimir
1. **Dual-write phase**: Configure Prometheus to remote_write to both Mimir and M3DB simultaneously.
2. **Validation**: Compare query results between Mimir and M3DB for the same time ranges.
3. **Cutover**: Once retention in M3DB covers your needs, remove the Mimir remote_write target.
4. **Cleanup**: Decommission Mimir components.
## Tuning for Vultr
- **Storage**: The `vultr-block-storage-m3db` StorageClass uses `high_perf` (NVMe SSD). Adjust `storage` in the VolumeClaimTemplates based on your cardinality and retention.
- **Node sizing**: M3DB is memory-hungry. Recommend at least 8GB RAM nodes on Vultr. The manifest requests 4Gi per m3dbnode pod.
- **Shards**: The init job creates 64 shards across 3 nodes. For higher cardinality, increase to 128 or 256.
- **Volume expansion**: The StorageClass has `allowVolumeExpansion: true` — you can resize PVCs online via `kubectl edit pvc`.
## Useful Commands
```bash
# Check placement
curl http://localhost:7201/api/v1/services/m3db/placement | jq
# Check namespace readiness
curl http://localhost:7201/api/v1/services/m3db/namespace/ready \
-d '{"name":"default"}'
# Write a test metric
curl -X POST http://localhost:7201/api/v1/prom/remote/write \
-H "Content-Type: application/x-protobuf"
# Query via PromQL
curl "http://localhost:7201/api/v1/query?query=up"
# Delete the init job to re-run (if needed)
kubectl -n m3db delete job m3db-cluster-init
kubectl apply -f 06-init-and-pdb.yaml
```