2026-02-15 08:55:22 -05:00
## JormunDB Architecture
2026-02-16 01:40:51 -05:00
# !!THIS IS NO LONGER ENTIRELY ACCURATE IGNORE OR UPDATE WITH ACCURATE INFO!!
2026-02-15 08:55:22 -05:00
This document explains the internal architecture of JormunDB, including design decisions, storage formats, and the arena-per-request memory management pattern.
## Table of Contents
- [Overview ](#overview )
- [Why Odin? ](#why-odin )
- [Memory Management ](#memory-management )
- [Storage Format ](#storage-format )
- [Module Structure ](#module-structure )
- [Request Flow ](#request-flow )
- [Concurrency Model ](#concurrency-model )
## Overview
JormunDB is a DynamoDB-compatible database server that speaks the DynamoDB wire protocol. It uses RocksDB for persistent storage and is written in Odin for elegant memory management.
### Key Design Goals
1. **Zero allocation ceremony ** - No explicit `defer free()` or error handling for every allocation
2. **Binary storage ** - Efficient TLV encoding instead of JSON
2026-02-15 11:57:59 -05:00
3. **API compatibility ** - Drop-in replacement for DynamoDB
2026-02-15 08:55:22 -05:00
4. **Performance ** - RocksDB-backed with efficient key encoding
## Why Odin?
The original implementation in Zig suffered from explicit allocator threading:
```zig
// Zig version - explicit allocator everywhere
fn handleRequest(allocator: std.mem.Allocator, request: []const u8) !Response {
const parsed = try parseJson(allocator, request);
defer parsed.deinit(allocator);
2026-02-15 11:57:59 -05:00
2026-02-15 08:55:22 -05:00
const item = try storage.getItem(allocator, parsed.table_name, parsed.key);
defer if (item) |i| freeItem(allocator, i);
2026-02-15 11:57:59 -05:00
2026-02-15 08:55:22 -05:00
const response = try serializeResponse(allocator, item);
defer allocator.free(response);
2026-02-15 11:57:59 -05:00
2026-02-15 08:55:22 -05:00
return response; // Wait, we deferred the free!
}
```
Odin's context allocator system eliminates this:
```odin
// Odin version - implicit context allocator
handle_request :: proc(request: []byte) -> Response {
// All allocations use context.allocator automatically
parsed := parse_json(request)
item := storage_get_item(parsed.table_name, parsed.key)
response := serialize_response(item)
2026-02-15 11:57:59 -05:00
2026-02-15 08:55:22 -05:00
return response
// Everything freed when arena is destroyed
}
```
## Memory Management
JormunDB uses a two-allocator strategy:
### 1. Arena Allocator (Request-Scoped)
Every HTTP request gets its own arena:
```odin
handle_connection :: proc(conn: net.TCP_Socket) {
// Create arena for this request (4MB)
arena: mem.Arena
mem.arena_init(&arena, make([]byte, mem.Megabyte * 4))
defer mem.arena_destroy(&arena)
2026-02-15 11:57:59 -05:00
2026-02-15 08:55:22 -05:00
// Set context allocator
context.allocator = mem.arena_allocator(&arena)
2026-02-15 11:57:59 -05:00
2026-02-15 08:55:22 -05:00
// All downstream code uses context.allocator
request := parse_http_request(conn) // uses arena
response := handle_request(request) // uses arena
send_response(conn, response) // uses arena
2026-02-15 11:57:59 -05:00
2026-02-15 08:55:22 -05:00
// Arena is freed here - everything cleaned up automatically
}
```
**Benefits:**
- No individual `free()` calls needed
- No `errdefer` cleanup
- No use-after-free bugs
- No memory leaks from forgotten frees
- Predictable performance (no GC pauses)
### 2. Default Allocator (Long-Lived Data)
The default allocator (typically `context.allocator` at program start) is used for:
- Table metadata
- Table locks (sync.RW_Mutex)
- Engine state
- Items returned from storage layer (copied to request arena when needed)
## Storage Format
### Binary Keys (Varint-Prefixed Segments)
All keys use varint length prefixes for space efficiency:
```
Meta key: [0x01][len][table_name]
Data key: [0x02][len][table_name][len][pk_value][len][sk_value]?
GSI key: [0x03][len][table_name][len][index_name][len][gsi_pk][len][gsi_sk]?
LSI key: [0x04][len][table_name][len][index_name][len][pk][len][lsi_sk]
```
**Example Data Key:**
```
Table: "Users"
PK: "user:123"
SK: "profile"
Encoded:
[0x02] // Entity type (Data)
[0x05] // Table name length (5)
Users // Table name bytes
[0x08] // PK length (8)
user:123 // PK bytes
[0x07] // SK length (7)
profile // SK bytes
```
### Item Encoding (TLV Format)
Items use Tag-Length-Value encoding for space efficiency:
```
Format:
[attr_count:varint]
[name_len:varint][name:bytes][type_tag:u8][value_len:varint][value:bytes]...
Type Tags:
String = 0x01 Number = 0x02 Binary = 0x03
Bool = 0x04 Null = 0x05
SS = 0x10 NS = 0x11 BS = 0x12
List = 0x20 Map = 0x21
```
**Example Item:**
```json
{
"id": {"S": "user123"},
"age": {"N": "30"}
}
```
Encoded as:
```
[0x02] // 2 attributes
[0x02] // name length (2)
id // name bytes
[0x01] // type tag (String)
[0x07] // value length (7)
user123 // value bytes
2026-02-15 11:57:59 -05:00
2026-02-15 08:55:22 -05:00
[0x03] // name length (3)
age // name bytes
[0x02] // type tag (Number)
[0x02] // value length (2)
30 // value bytes (stored as string)
```
## Request Flow
```
1. HTTP POST / arrives
↓
2. Create arena allocator (4MB)
Set context.allocator = arena_allocator
↓
3. Parse HTTP headers
Extract X-Amz-Target → Operation
↓
4. Parse JSON body
Convert DynamoDB JSON → internal types
↓
5. Route to handler (e.g., handle_put_item)
↓
6. Storage engine operation
- Build binary key
- Encode item to TLV
- RocksDB put/get/delete
↓
7. Build response
- Serialize item to DynamoDB JSON
- Format HTTP response
↓
8. Send response
↓
9. Destroy arena
All request memory freed automatically
```
## Concurrency Model
### Table-Level RW Locks
Each table has a reader-writer lock:
```odin
Storage_Engine :: struct {
db: rocksdb.DB,
table_locks: map[string]^sync.RW_Mutex,
table_locks_mutex: sync.Mutex,
}
```
**Read Operations** (GetItem, Query, Scan):
- Acquire shared lock
- Multiple readers can run concurrently
- Writers are blocked
**Write Operations** (PutItem, DeleteItem, UpdateItem):
- Acquire exclusive lock
- Only one writer at a time
- All readers are blocked
### Thread Safety
- RocksDB handles are thread-safe (column family-based)
- Table metadata is protected by locks
- Request arenas are thread-local (no sharing)
## Error Handling
Odin uses explicit error returns via `or_return` :
```odin
// Odin error handling
parse_json :: proc(data: []byte) -> (Item, bool) {
parsed := json.parse(data) or_return
item := json_to_item(parsed) or_return
return item, true
}
// Usage
item := parse_json(request.body) or_else {
return error_response(.ValidationException, "Invalid JSON")
}
```
No exceptions, no panic-recover patterns. Every error path is explicit.
## DynamoDB Wire Protocol
### Request Format
```
POST / HTTP/1.1
X-Amz-Target: DynamoDB_20120810.PutItem
Content-Type: application/x-amz-json-1.0
{
"TableName": "Users",
"Item": {
"id": {"S": "user123"},
"name": {"S": "Alice"}
}
}
```
### Response Format
```
HTTP/1.1 200 OK
Content-Type: application/x-amz-json-1.0
x-amzn-RequestId: local-request-id
{}
```
### Error Format
```json
{
"__type": "com.amazonaws.dynamodb.v20120810#ResourceNotFoundException ",
"message": "Table not found"
}
```
## Performance Characteristics
### Time Complexity
| Operation | Complexity | Notes |
|-----------|-----------|-------|
| PutItem | O(log n) | RocksDB LSM tree insert |
| GetItem | O(log n) | RocksDB point lookup |
| DeleteItem | O(log n) | RocksDB deletion |
| Query | O(log n + m) | n = items in table, m = result set |
| Scan | O(n) | Full table scan |
### Space Complexity
- Binary keys: ~20-100 bytes (vs 50-200 bytes JSON)
- Binary items: ~30% smaller than JSON
- Varint encoding saves space on small integers
### Benchmarks (Expected)
Based on Zig version performance:
```
Operation Throughput Latency (p50)
PutItem ~5,000/sec ~0.2ms
GetItem ~7,000/sec ~0.14ms
Query (1 item) ~8,000/sec ~0.12ms
Scan (1000 items) ~20/sec ~50ms
```
## Future Enhancements
### Planned Features
1. **UpdateExpression ** - SET/REMOVE/ADD/DELETE operations
2. **FilterExpression ** - Post-query filtering
3. **ProjectionExpression ** - Return subset of attributes
4. **Global Secondary Indexes ** - Query by non-key attributes
5. **Local Secondary Indexes ** - Alternate sort keys
6. **BatchWriteItem ** - Batch mutations
7. **BatchGetItem ** - Batch reads
8. **Transactions ** - ACID multi-item operations
### Optimization Opportunities
1. **Connection pooling ** - Reuse HTTP connections
2. **Bloom filters ** - Faster negative lookups
3. **Compression ** - LZ4/Zstd on large items
4. **Caching layer ** - Hot item cache
5. **Parallel scan ** - Segment-based scanning
## Debugging
### Enable Verbose Logging
```bash
make run VERBOSE=1
```
### Inspect RocksDB
```bash
# Use ldb tool to inspect database
ldb --db=./data scan
ldb --db=./data get <key_hex>
```
### Memory Profiling
Odin's tracking allocator can detect leaks:
```odin
when ODIN_DEBUG {
track: mem.Tracking_Allocator
mem.tracking_allocator_init(&track, context.allocator)
context.allocator = mem.tracking_allocator(&track)
2026-02-15 11:57:59 -05:00
2026-02-15 08:55:22 -05:00
defer {
for _, leak in track.allocation_map {
fmt.printfln("Leaked %d bytes at %p", leak.size, leak.location)
}
}
}
```
## Migration from Zig Version
The Zig version (ZynamoDB) used the same binary storage format, so existing RocksDB databases can be read by JormunDB without migration.
### Compatibility
- ✅ Binary key format (byte-compatible)
- ✅ Binary item format (byte-compatible)
- ✅ Table metadata (JSON, compatible)
- ✅ HTTP wire protocol (identical)
### Breaking Changes
None - JormunDB can open ZynamoDB databases directly.
---
## Contributing
When contributing to JormunDB:
1. **Use the context allocator ** - All request-scoped allocations should use `context.allocator`
2. **Avoid manual frees ** - Let the arena handle it
3. **Long-lived data ** - Use the default allocator explicitly
4. **Test thoroughly ** - Run `make test` before committing
5. **Format code ** - Run `make fmt` before committing
## References
- [Odin Language ](https://odin-lang.org/ )
- [RocksDB Wiki ](https://github.com/facebook/rocksdb/wiki )
- [DynamoDB API Reference ](https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/ )
- [Varint Encoding ](https://developers.google.com/protocol-buffers/docs/encoding#varints )