README.md

# M3DB on Vultr Kubernetes Engine

Drop-in Mimir replacement using M3DB for long-term Prometheus metrics storage, deployed on Vultr VKE with Vultr Block Storage CSI.

## Architecture

```
                     ┌─────────────────────────────────────────────────────┐
                     │                 Vultr VKE Cluster                   │
                     │                                                     │
External Prometheus ─┼──remote_write──▶ Vultr LoadBalancer (m3coordinator-lb)
External Grafana    ─┼──PromQL query──▶         │ (managed, provisioned by CCM)
                     │                           │
In-cluster Prometheus┼──remote_write──▶ M3 Coordinator (Deployment, 2 replicas)
In-cluster Grafana   ┼──PromQL query──▶       │
                     │                        │
                     │                ┌───────┴───────┐
                     │                │   M3DB Nodes  │  (StatefulSet, 3 replicas)
                     │                │  Vultr Block  │  (100Gi NVMe per node)
                     │                │   Storage     │
                     │                └───────┬───────┘
                     │                        │
                     │                  etcd cluster   (StatefulSet, 3 replicas)
                     └─────────────────────────────────────────────────────┘
```

## Retention Tiers

| Namespace      | Resolution | Retention | Use Case                  |
|----------------|-----------|-----------|---------------------------|
| `default`      | raw       | 48h       | Real-time queries         |
| `agg_10s_30d`  | 10s       | 30 days   | Recent dashboards         |
| `agg_1m_1y`    | 1m        | 1 year    | Long-term trends/capacity |

## Deployment

```bash
# 1. Apply everything
kubectl apply -k .

# 2. Wait for all pods to be Running
kubectl -n m3db get pods -w

# 3. Bootstrap the cluster (placement + namespaces)
#    The init job waits for coordinator health, which requires m3db to be bootstrapped.
#    Bootstrap directly via m3dbnode's embedded coordinator:
kubectl -n m3db exec m3dbnode-0 -- curl -s -X POST http://localhost:7201/api/v1/services/m3db/placement/init \
  -H "Content-Type: application/json" -d '{
    "num_shards": 64,
    "replication_factor": 3,
    "instances": [
      {"id": "m3dbnode-0", "isolation_group": "zone-a", "zone": "embedded", "weight": 100, "endpoint": "m3dbnode-0.m3dbnode.m3db.svc.cluster.local:9000", "hostname": "m3dbnode-0", "port": 9000},
      {"id": "m3dbnode-1", "isolation_group": "zone-b", "zone": "embedded", "weight": 100, "endpoint": "m3dbnode-1.m3dbnode.m3db.svc.cluster.local:9000", "hostname": "m3dbnode-1", "port": 9000},
      {"id": "m3dbnode-2", "isolation_group": "zone-c", "zone": "embedded", "weight": 100, "endpoint": "m3dbnode-2.m3dbnode.m3db.svc.cluster.local:9000", "hostname": "m3dbnode-2", "port": 9000}
    ]
  }'

kubectl -n m3db exec m3dbnode-0 -- curl -s -X POST http://localhost:7201/api/v1/services/m3db/namespace \
  -H "Content-Type: application/json" -d '{"name":"default","options":{"bootstrapEnabled":true,"flushEnabled":true,"writesToCommitLog":true,"cleanupEnabled":true,"snapshotEnabled":true,"repairEnabled":false,"retentionOptions":{"retentionPeriodDuration":"48h","blockSizeDuration":"2h","bufferFutureDuration":"10m","bufferPastDuration":"10m"},"indexOptions":{"enabled":true,"blockSizeDuration":"2h"}}}'

kubectl -n m3db exec m3dbnode-0 -- curl -s -X POST http://localhost:7201/api/v1/services/m3db/namespace \
  -H "Content-Type: application/json" -d '{"name":"agg_10s_30d","options":{"bootstrapEnabled":true,"flushEnabled":true,"writesToCommitLog":true,"cleanupEnabled":true,"snapshotEnabled":true,"retentionOptions":{"retentionPeriodDuration":"720h","blockSizeDuration":"12h","bufferFutureDuration":"10m","bufferPastDuration":"10m"},"indexOptions":{"enabled":true,"blockSizeDuration":"12h"},"aggregationOptions":{"aggregations":[{"aggregated":true,"attributes":{"resolutionDuration":"10s"}}]}}}'

kubectl -n m3db exec m3dbnode-0 -- curl -s -X POST http://localhost:7201/api/v1/services/m3db/namespace \
  -H "Content-Type: application/json" -d '{"name":"agg_1m_1y","options":{"bootstrapEnabled":true,"flushEnabled":true,"writesToCommitLog":true,"cleanupEnabled":true,"snapshotEnabled":true,"retentionOptions":{"retentionPeriodDuration":"8760h","blockSizeDuration":"24h","bufferFutureDuration":"10m","bufferPastDuration":"10m"},"indexOptions":{"enabled":true,"blockSizeDuration":"24h"},"aggregationOptions":{"aggregations":[{"aggregated":true,"attributes":{"resolutionDuration":"1m"}}]}}}'

# 4. Wait for bootstrapping to complete (check shard state = AVAILABLE)
kubectl -n m3db exec m3dbnode-0 -- curl -s http://localhost:9002/health

# 5. Get the LoadBalancer IP
kubectl -n m3db get svc m3coordinator-lb
```

## Testing

**Quick connectivity test:**
```bash
./test-metrics.sh <LB_IP>
```

This script verifies:
1. Coordinator health endpoint responds
2. Placement is configured with all 3 m3dbnode instances
3. All 3 namespaces are created (default, agg_10s_30d, agg_1m_1y)
4. PromQL queries work

**Full read/write test (Python):**
```bash
pip install requests python-snappy
python3 test-metrics.py <LB_IP>
```

Writes a test metric via Prometheus remote_write and reads it back.

## Prometheus Configuration (Replacing Mimir)

Update your Prometheus config to point at M3 Coordinator.

**In-cluster (same VKE cluster):**
```yaml
# prometheus.yml
remote_write:
  - url: "http://m3coordinator.m3db.svc.cluster.local:7201/api/v1/prom/remote/write"
    queue_config:
      capacity: 10000
      max_shards: 30
      max_samples_per_send: 5000
      batch_send_deadline: 5s

remote_read:
  - url: "http://m3coordinator.m3db.svc.cluster.local:7201/api/v1/prom/remote/read"
    read_recent: true
```

**External (cross-region/cross-cluster):**
```yaml
# prometheus.yml
remote_write:
  - url: "http://<LB-IP>:7201/api/v1/prom/remote/write"
    queue_config:
      capacity: 10000
      max_shards: 30
      max_samples_per_send: 5000
      batch_send_deadline: 5s

remote_read:
  - url: "http://<LB-IP>:7201/api/v1/prom/remote/read"
    read_recent: true
```

Get the LoadBalancer IP:
```bash
kubectl -n m3db get svc m3coordinator-lb
```

## Grafana Datasource

Add a **Prometheus** datasource in Grafana pointing to:

- **In-cluster:** `http://m3coordinator.m3db.svc.cluster.local:7201`
- **External:** `http://<LB-IP>:7201`

All existing PromQL dashboards will work without modification.

## Migration from Mimir

1. **Dual-write phase**: Configure Prometheus to remote_write to both Mimir and M3DB simultaneously.
2. **Validation**: Compare query results between Mimir and M3DB for the same time ranges.
3. **Cutover**: Once retention in M3DB covers your needs, remove the Mimir remote_write target.
4. **Cleanup**: Decommission Mimir components.

## Tuning for Vultr

- **Storage**: The `vultr-block-storage-m3db` StorageClass uses `disk_type: nvme` (NVMe SSD). Adjust `storage` in the VolumeClaimTemplates based on your cardinality and retention.
- **Node sizing**: M3DB is memory-hungry. Recommend at least 8GB RAM nodes on Vultr. The manifest requests 4Gi per m3dbnode pod.
- **Shards**: The init job creates 64 shards across 3 nodes. For higher cardinality, increase to 128 or 256.
- **Volume expansion**: The StorageClass has `allowVolumeExpansion: true` — you can resize PVCs online via `kubectl edit pvc`.

## Useful Commands

```bash
# Get LoadBalancer IP
kubectl -n m3db get svc m3coordinator-lb

# Check cluster health (from inside cluster)
kubectl -n m3db exec m3dbnode-0 -- curl -s http://m3coordinator.m3db.svc.cluster.local:7201/health

# Check placement (from inside cluster)
kubectl -n m3db exec m3dbnode-0 -- curl -s http://m3coordinator.m3db.svc.cluster.local:7201/api/v1/services/m3db/placement | jq

# Check m3dbnode bootstrapped status
kubectl -n m3db exec m3dbnode-0 -- curl -s http://localhost:9002/health

# Query via PromQL (external)
curl "http://<LB-IP>:7201/api/v1/query?query=up"

# Delete the init job to re-run (if needed)
kubectl -n m3db delete job m3db-cluster-init
kubectl apply -f 06-init-and-pdb.yaml
```
init commit 2026-03-31 08:28:16 -04:00			`# M3DB on Vultr Kubernetes Engine`

			`Drop-in Mimir replacement using M3DB for long-term Prometheus metrics storage, deployed on Vultr VKE with Vultr Block Storage CSI.`

			`## Architecture`

			```
Fix m3dbnode port conflict, update README, fix test script - Remove duplicate db.metrics section (port 7203 conflict) - Fix coordinator health endpoint (/health not /api/v1/services/m3db/health) - Update README: remove NodePort references, always use LoadBalancer - Add bootstrap instructions (workaround for init job chicken-and-egg) - Fix test-metrics.sh: correct health endpoint and JSON parsing 2026-03-31 15:49:59 +00:00			`┌─────────────────────────────────────────────────────┐`
			`│ Vultr VKE Cluster │`
			`│ │`
			`External Prometheus ─┼──remote_write──▶ Vultr LoadBalancer (m3coordinator-lb)`
			`External Grafana ─┼──PromQL query──▶ │ (managed, provisioned by CCM)`
			`│ │`
			`In-cluster Prometheus┼──remote_write──▶ M3 Coordinator (Deployment, 2 replicas)`
			`In-cluster Grafana ┼──PromQL query──▶ │`
			`│ │`
			`│ ┌───────┴───────┐`
			`│ │ M3DB Nodes │ (StatefulSet, 3 replicas)`
			`│ │ Vultr Block │ (100Gi NVMe per node)`
			`│ │ Storage │`
			`│ └───────┬───────┘`
			`│ │`
			`│ etcd cluster (StatefulSet, 3 replicas)`
			`└─────────────────────────────────────────────────────┘`
init commit 2026-03-31 08:28:16 -04:00			```

			`## Retention Tiers`

			`\| Namespace \| Resolution \| Retention \| Use Case \|`
			`\|----------------\|-----------\|-----------\|---------------------------\|`
			\| `default` \| raw \| 48h \| Real-time queries \|
			\| `agg_10s_30d` \| 10s \| 30 days \| Recent dashboards \|
			\| `agg_1m_1y` \| 1m \| 1 year \| Long-term trends/capacity \|

			`## Deployment`

			```bash
Fix m3dbnode port conflict, update README, fix test script - Remove duplicate db.metrics section (port 7203 conflict) - Fix coordinator health endpoint (/health not /api/v1/services/m3db/health) - Update README: remove NodePort references, always use LoadBalancer - Add bootstrap instructions (workaround for init job chicken-and-egg) - Fix test-metrics.sh: correct health endpoint and JSON parsing 2026-03-31 15:49:59 +00:00			`# 1. Apply everything`
init commit 2026-03-31 08:28:16 -04:00			`kubectl apply -k .`

Fix m3dbnode port conflict, update README, fix test script - Remove duplicate db.metrics section (port 7203 conflict) - Fix coordinator health endpoint (/health not /api/v1/services/m3db/health) - Update README: remove NodePort references, always use LoadBalancer - Add bootstrap instructions (workaround for init job chicken-and-egg) - Fix test-metrics.sh: correct health endpoint and JSON parsing 2026-03-31 15:49:59 +00:00			`# 2. Wait for all pods to be Running`
init commit 2026-03-31 08:28:16 -04:00			`kubectl -n m3db get pods -w`

Fix m3dbnode port conflict, update README, fix test script - Remove duplicate db.metrics section (port 7203 conflict) - Fix coordinator health endpoint (/health not /api/v1/services/m3db/health) - Update README: remove NodePort references, always use LoadBalancer - Add bootstrap instructions (workaround for init job chicken-and-egg) - Fix test-metrics.sh: correct health endpoint and JSON parsing 2026-03-31 15:49:59 +00:00			`# 3. Bootstrap the cluster (placement + namespaces)`
			`# The init job waits for coordinator health, which requires m3db to be bootstrapped.`
			`# Bootstrap directly via m3dbnode's embedded coordinator:`
			`kubectl -n m3db exec m3dbnode-0 -- curl -s -X POST http://localhost:7201/api/v1/services/m3db/placement/init \`
			`-H "Content-Type: application/json" -d '{`
			`"num_shards": 64,`
			`"replication_factor": 3,`
			`"instances": [`
			`{"id": "m3dbnode-0", "isolation_group": "zone-a", "zone": "embedded", "weight": 100, "endpoint": "m3dbnode-0.m3dbnode.m3db.svc.cluster.local:9000", "hostname": "m3dbnode-0", "port": 9000},`
			`{"id": "m3dbnode-1", "isolation_group": "zone-b", "zone": "embedded", "weight": 100, "endpoint": "m3dbnode-1.m3dbnode.m3db.svc.cluster.local:9000", "hostname": "m3dbnode-1", "port": 9000},`
			`{"id": "m3dbnode-2", "isolation_group": "zone-c", "zone": "embedded", "weight": 100, "endpoint": "m3dbnode-2.m3dbnode.m3db.svc.cluster.local:9000", "hostname": "m3dbnode-2", "port": 9000}`
			`]`
			`}'`

			`kubectl -n m3db exec m3dbnode-0 -- curl -s -X POST http://localhost:7201/api/v1/services/m3db/namespace \`
			`-H "Content-Type: application/json" -d '{"name":"default","options":{"bootstrapEnabled":true,"flushEnabled":true,"writesToCommitLog":true,"cleanupEnabled":true,"snapshotEnabled":true,"repairEnabled":false,"retentionOptions":{"retentionPeriodDuration":"48h","blockSizeDuration":"2h","bufferFutureDuration":"10m","bufferPastDuration":"10m"},"indexOptions":{"enabled":true,"blockSizeDuration":"2h"}}}'`

			`kubectl -n m3db exec m3dbnode-0 -- curl -s -X POST http://localhost:7201/api/v1/services/m3db/namespace \`
			`-H "Content-Type: application/json" -d '{"name":"agg_10s_30d","options":{"bootstrapEnabled":true,"flushEnabled":true,"writesToCommitLog":true,"cleanupEnabled":true,"snapshotEnabled":true,"retentionOptions":{"retentionPeriodDuration":"720h","blockSizeDuration":"12h","bufferFutureDuration":"10m","bufferPastDuration":"10m"},"indexOptions":{"enabled":true,"blockSizeDuration":"12h"},"aggregationOptions":{"aggregations":[{"aggregated":true,"attributes":{"resolutionDuration":"10s"}}]}}}'`

			`kubectl -n m3db exec m3dbnode-0 -- curl -s -X POST http://localhost:7201/api/v1/services/m3db/namespace \`
			`-H "Content-Type: application/json" -d '{"name":"agg_1m_1y","options":{"bootstrapEnabled":true,"flushEnabled":true,"writesToCommitLog":true,"cleanupEnabled":true,"snapshotEnabled":true,"retentionOptions":{"retentionPeriodDuration":"8760h","blockSizeDuration":"24h","bufferFutureDuration":"10m","bufferPastDuration":"10m"},"indexOptions":{"enabled":true,"blockSizeDuration":"24h"},"aggregationOptions":{"aggregations":[{"aggregated":true,"attributes":{"resolutionDuration":"1m"}}]}}}'`

			`# 4. Wait for bootstrapping to complete (check shard state = AVAILABLE)`
			`kubectl -n m3db exec m3dbnode-0 -- curl -s http://localhost:9002/health`

			`# 5. Get the LoadBalancer IP`
			`kubectl -n m3db get svc m3coordinator-lb`
			```

			`## Testing`
init commit 2026-03-31 08:28:16 -04:00
Fix m3dbnode port conflict, update README, fix test script - Remove duplicate db.metrics section (port 7203 conflict) - Fix coordinator health endpoint (/health not /api/v1/services/m3db/health) - Update README: remove NodePort references, always use LoadBalancer - Add bootstrap instructions (workaround for init job chicken-and-egg) - Fix test-metrics.sh: correct health endpoint and JSON parsing 2026-03-31 15:49:59 +00:00			`Quick connectivity test:`
			```bash
			`./test-metrics.sh <LB_IP>`
			```

			`This script verifies:`
			`1. Coordinator health endpoint responds`
			`2. Placement is configured with all 3 m3dbnode instances`
			`3. All 3 namespaces are created (default, agg_10s_30d, agg_1m_1y)`
			`4. PromQL queries work`

			`Full read/write test (Python):`
			```bash
			`pip install requests python-snappy`
			`python3 test-metrics.py <LB_IP>`
init commit 2026-03-31 08:28:16 -04:00			```

Fix m3dbnode port conflict, update README, fix test script - Remove duplicate db.metrics section (port 7203 conflict) - Fix coordinator health endpoint (/health not /api/v1/services/m3db/health) - Update README: remove NodePort references, always use LoadBalancer - Add bootstrap instructions (workaround for init job chicken-and-egg) - Fix test-metrics.sh: correct health endpoint and JSON parsing 2026-03-31 15:49:59 +00:00			`Writes a test metric via Prometheus remote_write and reads it back.`

init commit 2026-03-31 08:28:16 -04:00			`## Prometheus Configuration (Replacing Mimir)`

Fix m3dbnode port conflict, update README, fix test script - Remove duplicate db.metrics section (port 7203 conflict) - Fix coordinator health endpoint (/health not /api/v1/services/m3db/health) - Update README: remove NodePort references, always use LoadBalancer - Add bootstrap instructions (workaround for init job chicken-and-egg) - Fix test-metrics.sh: correct health endpoint and JSON parsing 2026-03-31 15:49:59 +00:00			`Update your Prometheus config to point at M3 Coordinator.`
init commit 2026-03-31 08:28:16 -04:00
Fix m3dbnode port conflict, update README, fix test script - Remove duplicate db.metrics section (port 7203 conflict) - Fix coordinator health endpoint (/health not /api/v1/services/m3db/health) - Update README: remove NodePort references, always use LoadBalancer - Add bootstrap instructions (workaround for init job chicken-and-egg) - Fix test-metrics.sh: correct health endpoint and JSON parsing 2026-03-31 15:49:59 +00:00			`In-cluster (same VKE cluster):`
init commit 2026-03-31 08:28:16 -04:00			```yaml
			`# prometheus.yml`
			`remote_write:`
			`- url: "http://m3coordinator.m3db.svc.cluster.local:7201/api/v1/prom/remote/write"`
			`queue_config:`
			`capacity: 10000`
			`max_shards: 30`
			`max_samples_per_send: 5000`
			`batch_send_deadline: 5s`

			`remote_read:`
			`- url: "http://m3coordinator.m3db.svc.cluster.local:7201/api/v1/prom/remote/read"`
			`read_recent: true`
			```

Fix m3dbnode port conflict, update README, fix test script - Remove duplicate db.metrics section (port 7203 conflict) - Fix coordinator health endpoint (/health not /api/v1/services/m3db/health) - Update README: remove NodePort references, always use LoadBalancer - Add bootstrap instructions (workaround for init job chicken-and-egg) - Fix test-metrics.sh: correct health endpoint and JSON parsing 2026-03-31 15:49:59 +00:00			`External (cross-region/cross-cluster):`
			```yaml
			`# prometheus.yml`
			`remote_write:`
			`- url: "http://<LB-IP>:7201/api/v1/prom/remote/write"`
			`queue_config:`
			`capacity: 10000`
			`max_shards: 30`
			`max_samples_per_send: 5000`
			`batch_send_deadline: 5s`

			`remote_read:`
			`- url: "http://<LB-IP>:7201/api/v1/prom/remote/read"`
			`read_recent: true`
			```

			`Get the LoadBalancer IP:`
			```bash
			`kubectl -n m3db get svc m3coordinator-lb`
			```

init commit 2026-03-31 08:28:16 -04:00			`## Grafana Datasource`

			`Add a Prometheus datasource in Grafana pointing to:`

Fix m3dbnode port conflict, update README, fix test script - Remove duplicate db.metrics section (port 7203 conflict) - Fix coordinator health endpoint (/health not /api/v1/services/m3db/health) - Update README: remove NodePort references, always use LoadBalancer - Add bootstrap instructions (workaround for init job chicken-and-egg) - Fix test-metrics.sh: correct health endpoint and JSON parsing 2026-03-31 15:49:59 +00:00			- In-cluster: `http://m3coordinator.m3db.svc.cluster.local:7201`
			- External: `http://<LB-IP>:7201`
init commit 2026-03-31 08:28:16 -04:00
			`All existing PromQL dashboards will work without modification.`

			`## Migration from Mimir`

			`1. Dual-write phase: Configure Prometheus to remote_write to both Mimir and M3DB simultaneously.`
			`2. Validation: Compare query results between Mimir and M3DB for the same time ranges.`
			`3. Cutover: Once retention in M3DB covers your needs, remove the Mimir remote_write target.`
			`4. Cleanup: Decommission Mimir components.`

			`## Tuning for Vultr`

Fix m3dbnode port conflict, update README, fix test script - Remove duplicate db.metrics section (port 7203 conflict) - Fix coordinator health endpoint (/health not /api/v1/services/m3db/health) - Update README: remove NodePort references, always use LoadBalancer - Add bootstrap instructions (workaround for init job chicken-and-egg) - Fix test-metrics.sh: correct health endpoint and JSON parsing 2026-03-31 15:49:59 +00:00			- Storage: The `vultr-block-storage-m3db` StorageClass uses `disk_type: nvme` (NVMe SSD). Adjust `storage` in the VolumeClaimTemplates based on your cardinality and retention.
init commit 2026-03-31 08:28:16 -04:00			`- Node sizing: M3DB is memory-hungry. Recommend at least 8GB RAM nodes on Vultr. The manifest requests 4Gi per m3dbnode pod.`
			`- Shards: The init job creates 64 shards across 3 nodes. For higher cardinality, increase to 128 or 256.`
			- Volume expansion: The StorageClass has `allowVolumeExpansion: true` — you can resize PVCs online via `kubectl edit pvc`.

			`## Useful Commands`

			```bash
Fix m3dbnode port conflict, update README, fix test script - Remove duplicate db.metrics section (port 7203 conflict) - Fix coordinator health endpoint (/health not /api/v1/services/m3db/health) - Update README: remove NodePort references, always use LoadBalancer - Add bootstrap instructions (workaround for init job chicken-and-egg) - Fix test-metrics.sh: correct health endpoint and JSON parsing 2026-03-31 15:49:59 +00:00			`# Get LoadBalancer IP`
			`kubectl -n m3db get svc m3coordinator-lb`

			`# Check cluster health (from inside cluster)`
			`kubectl -n m3db exec m3dbnode-0 -- curl -s http://m3coordinator.m3db.svc.cluster.local:7201/health`
init commit 2026-03-31 08:28:16 -04:00
Fix m3dbnode port conflict, update README, fix test script - Remove duplicate db.metrics section (port 7203 conflict) - Fix coordinator health endpoint (/health not /api/v1/services/m3db/health) - Update README: remove NodePort references, always use LoadBalancer - Add bootstrap instructions (workaround for init job chicken-and-egg) - Fix test-metrics.sh: correct health endpoint and JSON parsing 2026-03-31 15:49:59 +00:00			`# Check placement (from inside cluster)`
			`kubectl -n m3db exec m3dbnode-0 -- curl -s http://m3coordinator.m3db.svc.cluster.local:7201/api/v1/services/m3db/placement \| jq`
init commit 2026-03-31 08:28:16 -04:00
Fix m3dbnode port conflict, update README, fix test script - Remove duplicate db.metrics section (port 7203 conflict) - Fix coordinator health endpoint (/health not /api/v1/services/m3db/health) - Update README: remove NodePort references, always use LoadBalancer - Add bootstrap instructions (workaround for init job chicken-and-egg) - Fix test-metrics.sh: correct health endpoint and JSON parsing 2026-03-31 15:49:59 +00:00			`# Check m3dbnode bootstrapped status`
			`kubectl -n m3db exec m3dbnode-0 -- curl -s http://localhost:9002/health`
init commit 2026-03-31 08:28:16 -04:00
Fix m3dbnode port conflict, update README, fix test script - Remove duplicate db.metrics section (port 7203 conflict) - Fix coordinator health endpoint (/health not /api/v1/services/m3db/health) - Update README: remove NodePort references, always use LoadBalancer - Add bootstrap instructions (workaround for init job chicken-and-egg) - Fix test-metrics.sh: correct health endpoint and JSON parsing 2026-03-31 15:49:59 +00:00			`# Query via PromQL (external)`
			`curl "http://<LB-IP>:7201/api/v1/query?query=up"`
init commit 2026-03-31 08:28:16 -04:00
			`# Delete the init job to re-run (if needed)`
			`kubectl -n m3db delete job m3db-cluster-init`
			`kubectl apply -f 06-init-and-pdb.yaml`
			```