10 KiB
JormunDB Architecture
!!THIS IS NO LONGER ENTIRELY ACCURATE IGNORE OR UPDATE WITH ACCURATE INFO!!
This document explains the internal architecture of JormunDB, including design decisions, storage formats, and the arena-per-request memory management pattern.
Table of Contents
Overview
JormunDB is a DynamoDB-compatible database server that speaks the DynamoDB wire protocol. It uses RocksDB for persistent storage and is written in Odin for elegant memory management.
Key Design Goals
- Zero allocation ceremony - No explicit
defer free()or error handling for every allocation - Binary storage - Efficient TLV encoding instead of JSON
- API compatibility - Drop-in replacement for DynamoDB
- Performance - RocksDB-backed with efficient key encoding
Why Odin?
The original implementation in Zig suffered from explicit allocator threading:
// Zig version - explicit allocator everywhere
fn handleRequest(allocator: std.mem.Allocator, request: []const u8) !Response {
const parsed = try parseJson(allocator, request);
defer parsed.deinit(allocator);
const item = try storage.getItem(allocator, parsed.table_name, parsed.key);
defer if (item) |i| freeItem(allocator, i);
const response = try serializeResponse(allocator, item);
defer allocator.free(response);
return response; // Wait, we deferred the free!
}
Odin's context allocator system eliminates this:
// Odin version - implicit context allocator
handle_request :: proc(request: []byte) -> Response {
// All allocations use context.allocator automatically
parsed := parse_json(request)
item := storage_get_item(parsed.table_name, parsed.key)
response := serialize_response(item)
return response
// Everything freed when arena is destroyed
}
Memory Management
JormunDB uses a two-allocator strategy:
1. Arena Allocator (Request-Scoped)
Every HTTP request gets its own arena:
handle_connection :: proc(conn: net.TCP_Socket) {
// Create arena for this request (4MB)
arena: mem.Arena
mem.arena_init(&arena, make([]byte, mem.Megabyte * 4))
defer mem.arena_destroy(&arena)
// Set context allocator
context.allocator = mem.arena_allocator(&arena)
// All downstream code uses context.allocator
request := parse_http_request(conn) // uses arena
response := handle_request(request) // uses arena
send_response(conn, response) // uses arena
// Arena is freed here - everything cleaned up automatically
}
Benefits:
- No individual
free()calls needed - No
errdefercleanup - No use-after-free bugs
- No memory leaks from forgotten frees
- Predictable performance (no GC pauses)
2. Default Allocator (Long-Lived Data)
The default allocator (typically context.allocator at program start) is used for:
- Table metadata
- Table locks (sync.RW_Mutex)
- Engine state
- Items returned from storage layer (copied to request arena when needed)
Storage Format
Binary Keys (Varint-Prefixed Segments)
All keys use varint length prefixes for space efficiency:
Meta key: [0x01][len][table_name]
Data key: [0x02][len][table_name][len][pk_value][len][sk_value]?
GSI key: [0x03][len][table_name][len][index_name][len][gsi_pk][len][gsi_sk]?
LSI key: [0x04][len][table_name][len][index_name][len][pk][len][lsi_sk]
Example Data Key:
Table: "Users"
PK: "user:123"
SK: "profile"
Encoded:
[0x02] // Entity type (Data)
[0x05] // Table name length (5)
Users // Table name bytes
[0x08] // PK length (8)
user:123 // PK bytes
[0x07] // SK length (7)
profile // SK bytes
Item Encoding (TLV Format)
Items use Tag-Length-Value encoding for space efficiency:
Format:
[attr_count:varint]
[name_len:varint][name:bytes][type_tag:u8][value_len:varint][value:bytes]...
Type Tags:
String = 0x01 Number = 0x02 Binary = 0x03
Bool = 0x04 Null = 0x05
SS = 0x10 NS = 0x11 BS = 0x12
List = 0x20 Map = 0x21
Example Item:
{
"id": {"S": "user123"},
"age": {"N": "30"}
}
Encoded as:
[0x02] // 2 attributes
[0x02] // name length (2)
id // name bytes
[0x01] // type tag (String)
[0x07] // value length (7)
user123 // value bytes
[0x03] // name length (3)
age // name bytes
[0x02] // type tag (Number)
[0x02] // value length (2)
30 // value bytes (stored as string)
Request Flow
1. HTTP POST / arrives
↓
2. Create arena allocator (4MB)
Set context.allocator = arena_allocator
↓
3. Parse HTTP headers
Extract X-Amz-Target → Operation
↓
4. Parse JSON body
Convert DynamoDB JSON → internal types
↓
5. Route to handler (e.g., handle_put_item)
↓
6. Storage engine operation
- Build binary key
- Encode item to TLV
- RocksDB put/get/delete
↓
7. Build response
- Serialize item to DynamoDB JSON
- Format HTTP response
↓
8. Send response
↓
9. Destroy arena
All request memory freed automatically
Concurrency Model
Table-Level RW Locks
Each table has a reader-writer lock:
Storage_Engine :: struct {
db: rocksdb.DB,
table_locks: map[string]^sync.RW_Mutex,
table_locks_mutex: sync.Mutex,
}
Read Operations (GetItem, Query, Scan):
- Acquire shared lock
- Multiple readers can run concurrently
- Writers are blocked
Write Operations (PutItem, DeleteItem, UpdateItem):
- Acquire exclusive lock
- Only one writer at a time
- All readers are blocked
Thread Safety
- RocksDB handles are thread-safe (column family-based)
- Table metadata is protected by locks
- Request arenas are thread-local (no sharing)
Error Handling
Odin uses explicit error returns via or_return:
// Odin error handling
parse_json :: proc(data: []byte) -> (Item, bool) {
parsed := json.parse(data) or_return
item := json_to_item(parsed) or_return
return item, true
}
// Usage
item := parse_json(request.body) or_else {
return error_response(.ValidationException, "Invalid JSON")
}
No exceptions, no panic-recover patterns. Every error path is explicit.
DynamoDB Wire Protocol
Request Format
POST / HTTP/1.1
X-Amz-Target: DynamoDB_20120810.PutItem
Content-Type: application/x-amz-json-1.0
{
"TableName": "Users",
"Item": {
"id": {"S": "user123"},
"name": {"S": "Alice"}
}
}
Response Format
HTTP/1.1 200 OK
Content-Type: application/x-amz-json-1.0
x-amzn-RequestId: local-request-id
{}
Error Format
{
"__type": "com.amazonaws.dynamodb.v20120810#ResourceNotFoundException",
"message": "Table not found"
}
Performance Characteristics
Time Complexity
| Operation | Complexity | Notes |
|---|---|---|
| PutItem | O(log n) | RocksDB LSM tree insert |
| GetItem | O(log n) | RocksDB point lookup |
| DeleteItem | O(log n) | RocksDB deletion |
| Query | O(log n + m) | n = items in table, m = result set |
| Scan | O(n) | Full table scan |
Space Complexity
- Binary keys: ~20-100 bytes (vs 50-200 bytes JSON)
- Binary items: ~30% smaller than JSON
- Varint encoding saves space on small integers
Benchmarks (Expected)
Based on Zig version performance:
Operation Throughput Latency (p50)
PutItem ~5,000/sec ~0.2ms
GetItem ~7,000/sec ~0.14ms
Query (1 item) ~8,000/sec ~0.12ms
Scan (1000 items) ~20/sec ~50ms
Future Enhancements
Planned Features
- UpdateExpression - SET/REMOVE/ADD/DELETE operations
- FilterExpression - Post-query filtering
- ProjectionExpression - Return subset of attributes
- Global Secondary Indexes - Query by non-key attributes
- Local Secondary Indexes - Alternate sort keys
- BatchWriteItem - Batch mutations
- BatchGetItem - Batch reads
- Transactions - ACID multi-item operations
Optimization Opportunities
- Connection pooling - Reuse HTTP connections
- Bloom filters - Faster negative lookups
- Compression - LZ4/Zstd on large items
- Caching layer - Hot item cache
- Parallel scan - Segment-based scanning
Debugging
Enable Verbose Logging
make run VERBOSE=1
Inspect RocksDB
# Use ldb tool to inspect database
ldb --db=./data scan
ldb --db=./data get <key_hex>
Memory Profiling
Odin's tracking allocator can detect leaks:
when ODIN_DEBUG {
track: mem.Tracking_Allocator
mem.tracking_allocator_init(&track, context.allocator)
context.allocator = mem.tracking_allocator(&track)
defer {
for _, leak in track.allocation_map {
fmt.printfln("Leaked %d bytes at %p", leak.size, leak.location)
}
}
}
Migration from Zig Version
The Zig version (ZynamoDB) used the same binary storage format, so existing RocksDB databases can be read by JormunDB without migration.
Compatibility
- ✅ Binary key format (byte-compatible)
- ✅ Binary item format (byte-compatible)
- ✅ Table metadata (JSON, compatible)
- ✅ HTTP wire protocol (identical)
Breaking Changes
None - JormunDB can open ZynamoDB databases directly.
Contributing
When contributing to JormunDB:
- Use the context allocator - All request-scoped allocations should use
context.allocator - Avoid manual frees - Let the arena handle it
- Long-lived data - Use the default allocator explicitly
- Test thoroughly - Run
make testbefore committing - Format code - Run
make fmtbefore committing