Files
jormun-db/TODO.md
2026-02-16 01:40:51 -05:00

4.0 KiB

JormunDB (Odin rewrite) — TODO

This tracks what's left to stabilize + extend the project

Now (MVP correctness + polish)

Goal: "aws cli works reliably for CreateTable/ListTables/PutItem/GetItem/DeleteItem/Scan/Query" with correct DynamoDB-ish responses.

1) HTTP + routing hardening

  • Audit request parsing boundaries:
    • Max body size enforcement (config exists, need to verify enforcement path)
    • Missing/invalid headers → correct DynamoDB error types
    • Content-Type handling (be permissive but consistent)
  • Ensure all request-scoped allocations come from the request arena (no accidental long-lived allocs)
    • Verified: handle_connection in http.odin sets context.allocator = request_alloc
    • Long-lived data (table metadata, locks) explicitly uses engine.allocator
  • Standardize error responses:
    • __type formatting — done, uses com.amazonaws.dynamodb.v20120810#ErrorType
    • message field consistency — done
    • Status code mapping per error type — DONE: centralized handle_storage_error + make_error_response now maps InternalServerError→500, everything else→400
    • Missing X-Amz-Target now returns SerializationException (matches real DynamoDB)

2) Storage correctness edge cases

  • Table metadata durability + validation:
    • Reject duplicate tables — done in create_table (checks existing meta key)
    • Reject invalid key schema — done in parse_key_schema (no HASH, multiple HASH, etc.)
  • Item validation against key schema:
    • Missing PK/SK errors — done in key_from_item
    • Type mismatch errors (S/N/B) — DONE: new validate_item_key_types proc checks item key attr types against AttributeDefinitions
  • Deterministic encoding tests:
    • Key codec round-trip
    • TLV item encode/decode round-trip (nested maps/lists/sets)

3) Query/Scan pagination parity

  • Make pagination behavior match AWS CLI expectations:
    • Limit — done
    • ExclusiveStartKey — done (parsed via JSON object lookup with key schema type reconstruction)
    • LastEvaluatedKey generation — FIXED: now saves key of last returned item (not next unread item); only emits when more results exist
  • Add "golden" pagination tests:
    • Query w/ sort key ranges
    • Scan limit + resume loop

4) Expression parsing reliability

  • Remove brittle string-scanning for KeyConditionExpression extraction:
    • DONE: parse_key_condition_expression_string uses JSON object lookup (handles whitespace/ordering safely)
  • Add validation + better errors for malformed expressions
  • Expand operator coverage: BETWEEN and begins_with are implemented in parser
  • Sort key condition filtering in queryDONE: query() now accepts optional Sort_Key_Condition and applies it (=, <, <=, >, >=, BETWEEN, begins_with)

5) Service Features

  • Configuration settings like environment variables for defining users and credentials
  • Configuration settings for setting up master and replica nodes

6) Test coverage / tooling

  • Add integration tests mirroring AWS CLI script flows:
    • create table → put → get → scan → query → delete
  • Add fuzz-ish tests for:
    • JSON parsing robustness
    • expression parsing robustness
    • TLV decode failure cases (corrupt bytes)

7) Secondary indexes

  • Global Secondary Indexes (GSI)
  • Local Secondary Indexes (LSI)
  • Index backfill + write-path maintenance

8) Performance / ops

  • Connection reuse / keep-alive tuning
  • Bloom filters / RocksDB options tuning for common patterns
  • Optional compression policy (LZ4/Zstd knobs)
  • Parallel scan (segment scanning)

9) Replication / WAL

(There is a C++ shim stubbed out for WAL iteration and applying write batches.)

  • Implement WAL iterator: latest_sequence, wal_iter_next returning writebatch blob
  • Implement apply-writebatch on follower
  • Add a minimal replication test harness (leader generates N ops → follower applies → compare)