112 lines
3.9 KiB
Markdown
112 lines
3.9 KiB
Markdown
# M3DB on Vultr Kubernetes Engine
|
|
|
|
Drop-in Mimir replacement using M3DB for long-term Prometheus metrics storage, deployed on Vultr VKE with Vultr Block Storage CSI.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Prometheus ──remote_write──▶ M3 Coordinator (Deployment, 2 replicas)
|
|
Grafana ──PromQL query──▶ │
|
|
│
|
|
┌───────┴───────┐
|
|
│ M3DB Nodes │ (StatefulSet, 3 replicas)
|
|
│ Vultr Block │ (100Gi SSD per node)
|
|
│ Storage │
|
|
└───────┬───────┘
|
|
│
|
|
etcd cluster (StatefulSet, 3 replicas)
|
|
```
|
|
|
|
## Retention Tiers
|
|
|
|
| Namespace | Resolution | Retention | Use Case |
|
|
|----------------|-----------|-----------|---------------------------|
|
|
| `default` | raw | 48h | Real-time queries |
|
|
| `agg_10s_30d` | 10s | 30 days | Recent dashboards |
|
|
| `agg_1m_1y` | 1m | 1 year | Long-term trends/capacity |
|
|
|
|
## Deployment
|
|
|
|
```bash
|
|
# 1. Apply everything (except the init job won't succeed until pods are up)
|
|
kubectl apply -k .
|
|
|
|
# 2. Wait for all pods to be Ready
|
|
kubectl -n m3db get pods -w
|
|
|
|
# 3. Once all m3dbnode and m3coordinator pods are Running, the init job
|
|
# will bootstrap the cluster (placement + namespaces).
|
|
# Monitor it:
|
|
kubectl -n m3db logs -f job/m3db-cluster-init
|
|
|
|
# 4. Verify cluster health
|
|
kubectl -n m3db port-forward svc/m3coordinator 7201:7201
|
|
curl http://localhost:7201/api/v1/services/m3db/placement
|
|
curl http://localhost:7201/api/v1/services/m3db/namespace
|
|
```
|
|
|
|
## Prometheus Configuration (Replacing Mimir)
|
|
|
|
Update your Prometheus config to point at M3 Coordinator instead of Mimir:
|
|
|
|
```yaml
|
|
# prometheus.yml
|
|
remote_write:
|
|
- url: "http://m3coordinator.m3db.svc.cluster.local:7201/api/v1/prom/remote/write"
|
|
queue_config:
|
|
capacity: 10000
|
|
max_shards: 30
|
|
max_samples_per_send: 5000
|
|
batch_send_deadline: 5s
|
|
|
|
remote_read:
|
|
- url: "http://m3coordinator.m3db.svc.cluster.local:7201/api/v1/prom/remote/read"
|
|
read_recent: true
|
|
```
|
|
|
|
## Grafana Datasource
|
|
|
|
Add a **Prometheus** datasource in Grafana pointing to:
|
|
|
|
```
|
|
http://m3coordinator.m3db.svc.cluster.local:7201
|
|
```
|
|
|
|
All existing PromQL dashboards will work without modification.
|
|
|
|
## Migration from Mimir
|
|
|
|
1. **Dual-write phase**: Configure Prometheus to remote_write to both Mimir and M3DB simultaneously.
|
|
2. **Validation**: Compare query results between Mimir and M3DB for the same time ranges.
|
|
3. **Cutover**: Once retention in M3DB covers your needs, remove the Mimir remote_write target.
|
|
4. **Cleanup**: Decommission Mimir components.
|
|
|
|
## Tuning for Vultr
|
|
|
|
- **Storage**: The `vultr-block-storage-m3db` StorageClass uses `high_perf` (NVMe SSD). Adjust `storage` in the VolumeClaimTemplates based on your cardinality and retention.
|
|
- **Node sizing**: M3DB is memory-hungry. Recommend at least 8GB RAM nodes on Vultr. The manifest requests 4Gi per m3dbnode pod.
|
|
- **Shards**: The init job creates 64 shards across 3 nodes. For higher cardinality, increase to 128 or 256.
|
|
- **Volume expansion**: The StorageClass has `allowVolumeExpansion: true` — you can resize PVCs online via `kubectl edit pvc`.
|
|
|
|
## Useful Commands
|
|
|
|
```bash
|
|
# Check placement
|
|
curl http://localhost:7201/api/v1/services/m3db/placement | jq
|
|
|
|
# Check namespace readiness
|
|
curl http://localhost:7201/api/v1/services/m3db/namespace/ready \
|
|
-d '{"name":"default"}'
|
|
|
|
# Write a test metric
|
|
curl -X POST http://localhost:7201/api/v1/prom/remote/write \
|
|
-H "Content-Type: application/x-protobuf"
|
|
|
|
# Query via PromQL
|
|
curl "http://localhost:7201/api/v1/query?query=up"
|
|
|
|
# Delete the init job to re-run (if needed)
|
|
kubectl -n m3db delete job m3db-cluster-init
|
|
kubectl apply -f 06-init-and-pdb.yaml
|
|
```
|