## JormunDB Architecture
# !!THIS IS NO LONGER ENTIRELY ACCURATE IGNORE OR UPDATE WITH ACCURATE INFO!!

This document explains the internal architecture of JormunDB, including design decisions, storage formats, and the arena-per-request memory management pattern.

## Table of Contents

- [Overview](#overview)
- [Why Odin?](#why-odin)
- [Memory Management](#memory-management)
- [Storage Format](#storage-format)
- [Module Structure](#module-structure)
- [Request Flow](#request-flow)
- [Concurrency Model](#concurrency-model)

## Overview

JormunDB is a DynamoDB-compatible database server that speaks the DynamoDB wire protocol. It uses RocksDB for persistent storage and is written in Odin for elegant memory management.

### Key Design Goals

1. **Zero allocation ceremony** - No explicit `defer free()` or error handling for every allocation
2. **Binary storage** - Efficient TLV encoding instead of JSON
3. **API compatibility** - Drop-in replacement for DynamoDB
4. **Performance** - RocksDB-backed with efficient key encoding

## Why Odin?

The original implementation in Zig suffered from explicit allocator threading:

```zig
// Zig version - explicit allocator everywhere
fn handleRequest(allocator: std.mem.Allocator, request: []const u8) !Response {
    const parsed = try parseJson(allocator, request);
    defer parsed.deinit(allocator);

    const item = try storage.getItem(allocator, parsed.table_name, parsed.key);
    defer if (item) |i| freeItem(allocator, i);

    const response = try serializeResponse(allocator, item);
    defer allocator.free(response);

    return response; // Wait, we deferred the free!
}
```

Odin's context allocator system eliminates this:

```odin
// Odin version - implicit context allocator
handle_request :: proc(request: []byte) -> Response {
    // All allocations use context.allocator automatically
    parsed := parse_json(request)
    item := storage_get_item(parsed.table_name, parsed.key)
    response := serialize_response(item)

    return response
    // Everything freed when arena is destroyed
}
```

## Memory Management

JormunDB uses a two-allocator strategy:

### 1. Arena Allocator (Request-Scoped)

Every HTTP request gets its own arena:

```odin
handle_connection :: proc(conn: net.TCP_Socket) {
    // Create arena for this request (4MB)
    arena: mem.Arena
    mem.arena_init(&arena, make([]byte, mem.Megabyte * 4))
    defer mem.arena_destroy(&arena)

    // Set context allocator
    context.allocator = mem.arena_allocator(&arena)

    // All downstream code uses context.allocator
    request := parse_http_request(conn)    // uses arena
    response := handle_request(request)     // uses arena
    send_response(conn, response)           // uses arena

    // Arena is freed here - everything cleaned up automatically
}
```

**Benefits:**
- No individual `free()` calls needed
- No `errdefer` cleanup
- No use-after-free bugs
- No memory leaks from forgotten frees
- Predictable performance (no GC pauses)

### 2. Default Allocator (Long-Lived Data)

The default allocator (typically `context.allocator` at program start) is used for:

- Table metadata
- Table locks (sync.RW_Mutex)
- Engine state
- Items returned from storage layer (copied to request arena when needed)

## Storage Format

### Binary Keys (Varint-Prefixed Segments)

All keys use varint length prefixes for space efficiency:

```
Meta key:  [0x01][len][table_name]
Data key:  [0x02][len][table_name][len][pk_value][len][sk_value]?
GSI key:   [0x03][len][table_name][len][index_name][len][gsi_pk][len][gsi_sk]?
LSI key:   [0x04][len][table_name][len][index_name][len][pk][len][lsi_sk]
```

**Example Data Key:**
```
Table: "Users"
PK: "user:123"
SK: "profile"

Encoded:
[0x02]          // Entity type (Data)
[0x05]          // Table name length (5)
Users           // Table name bytes
[0x08]          // PK length (8)
user:123        // PK bytes
[0x07]          // SK length (7)
profile         // SK bytes
```

### Item Encoding (TLV Format)

Items use Tag-Length-Value encoding for space efficiency:

```
Format:
[attr_count:varint]
  [name_len:varint][name:bytes][type_tag:u8][value_len:varint][value:bytes]...

Type Tags:
  String  = 0x01    Number = 0x02    Binary = 0x03
  Bool    = 0x04    Null   = 0x05
  SS      = 0x10    NS     = 0x11    BS     = 0x12
  List    = 0x20    Map    = 0x21
```

**Example Item:**
```json
{
  "id": {"S": "user123"},
  "age": {"N": "30"}
}
```

Encoded as:
```
[0x02]              // 2 attributes
  [0x02]            // name length (2)
  id                // name bytes
  [0x01]            // type tag (String)
  [0x07]            // value length (7)
  user123           // value bytes

  [0x03]            // name length (3)
  age               // name bytes
  [0x02]            // type tag (Number)
  [0x02]            // value length (2)
  30                // value bytes (stored as string)
```

## Request Flow

```
1. HTTP POST / arrives
   ↓
2. Create arena allocator (4MB)
   Set context.allocator = arena_allocator
   ↓
3. Parse HTTP headers
   Extract X-Amz-Target → Operation
   ↓
4. Parse JSON body
   Convert DynamoDB JSON → internal types
   ↓
5. Route to handler (e.g., handle_put_item)
   ↓
6. Storage engine operation
   - Build binary key
   - Encode item to TLV
   - RocksDB put/get/delete
   ↓
7. Build response
   - Serialize item to DynamoDB JSON
   - Format HTTP response
   ↓
8. Send response
   ↓
9. Destroy arena
   All request memory freed automatically
```

## Concurrency Model

### Table-Level RW Locks

Each table has a reader-writer lock:

```odin
Storage_Engine :: struct {
    db:                 rocksdb.DB,
    table_locks:        map[string]^sync.RW_Mutex,
    table_locks_mutex:  sync.Mutex,
}
```

**Read Operations** (GetItem, Query, Scan):
- Acquire shared lock
- Multiple readers can run concurrently
- Writers are blocked

**Write Operations** (PutItem, DeleteItem, UpdateItem):
- Acquire exclusive lock
- Only one writer at a time
- All readers are blocked

### Thread Safety

- RocksDB handles are thread-safe (column family-based)
- Table metadata is protected by locks
- Request arenas are thread-local (no sharing)

## Error Handling

Odin uses explicit error returns via `or_return`:

```odin
// Odin error handling
parse_json :: proc(data: []byte) -> (Item, bool) {
    parsed := json.parse(data) or_return
    item := json_to_item(parsed) or_return
    return item, true
}

// Usage
item := parse_json(request.body) or_else {
    return error_response(.ValidationException, "Invalid JSON")
}
```

No exceptions, no panic-recover patterns. Every error path is explicit.

## DynamoDB Wire Protocol

### Request Format

```
POST / HTTP/1.1
X-Amz-Target: DynamoDB_20120810.PutItem
Content-Type: application/x-amz-json-1.0

{
  "TableName": "Users",
  "Item": {
    "id": {"S": "user123"},
    "name": {"S": "Alice"}
  }
}
```

### Response Format

```
HTTP/1.1 200 OK
Content-Type: application/x-amz-json-1.0
x-amzn-RequestId: local-request-id

{}
```

### Error Format

```json
{
  "__type": "com.amazonaws.dynamodb.v20120810#ResourceNotFoundException",
  "message": "Table not found"
}
```

## Performance Characteristics

### Time Complexity

| Operation | Complexity | Notes |
|-----------|-----------|-------|
| PutItem | O(log n) | RocksDB LSM tree insert |
| GetItem | O(log n) | RocksDB point lookup |
| DeleteItem | O(log n) | RocksDB deletion |
| Query | O(log n + m) | n = items in table, m = result set |
| Scan | O(n) | Full table scan |

### Space Complexity

- Binary keys: ~20-100 bytes (vs 50-200 bytes JSON)
- Binary items: ~30% smaller than JSON
- Varint encoding saves space on small integers

### Benchmarks (Expected)

Based on Zig version performance:

```
Operation          Throughput      Latency (p50)
PutItem            ~5,000/sec      ~0.2ms
GetItem            ~7,000/sec      ~0.14ms
Query (1 item)     ~8,000/sec      ~0.12ms
Scan (1000 items)  ~20/sec         ~50ms
```

## Future Enhancements

### Planned Features

1. **UpdateExpression** - SET/REMOVE/ADD/DELETE operations
2. **FilterExpression** - Post-query filtering
3. **ProjectionExpression** - Return subset of attributes
4. **Global Secondary Indexes** - Query by non-key attributes
5. **Local Secondary Indexes** - Alternate sort keys
6. **BatchWriteItem** - Batch mutations
7. **BatchGetItem** - Batch reads
8. **Transactions** - ACID multi-item operations

### Optimization Opportunities

1. **Connection pooling** - Reuse HTTP connections
2. **Bloom filters** - Faster negative lookups
3. **Compression** - LZ4/Zstd on large items
4. **Caching layer** - Hot item cache
5. **Parallel scan** - Segment-based scanning

## Debugging

### Enable Verbose Logging

```bash
make run VERBOSE=1
```

### Inspect RocksDB

```bash
# Use ldb tool to inspect database
ldb --db=./data scan
ldb --db=./data get <key_hex>
```

### Memory Profiling

Odin's tracking allocator can detect leaks:

```odin
when ODIN_DEBUG {
    track: mem.Tracking_Allocator
    mem.tracking_allocator_init(&track, context.allocator)
    context.allocator = mem.tracking_allocator(&track)

    defer {
        for _, leak in track.allocation_map {
            fmt.printfln("Leaked %d bytes at %p", leak.size, leak.location)
        }
    }
}
```

## Migration from Zig Version

The Zig version (ZynamoDB) used the same binary storage format, so existing RocksDB databases can be read by JormunDB without migration.

### Compatibility

- ✅ Binary key format (byte-compatible)
- ✅ Binary item format (byte-compatible)
- ✅ Table metadata (JSON, compatible)
- ✅ HTTP wire protocol (identical)

### Breaking Changes

None - JormunDB can open ZynamoDB databases directly.

---

## Contributing

When contributing to JormunDB:

1. **Use the context allocator** - All request-scoped allocations should use `context.allocator`
2. **Avoid manual frees** - Let the arena handle it
3. **Long-lived data** - Use the default allocator explicitly
4. **Test thoroughly** - Run `make test` before committing
5. **Format code** - Run `make fmt` before committing

## References

- [Odin Language](https://odin-lang.org/)
- [RocksDB Wiki](https://github.com/facebook/rocksdb/wiki)
- [DynamoDB API Reference](https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/)
- [Varint Encoding](https://developers.google.com/protocol-buffers/docs/encoding#varints)