SQLRite is a from-the-ground-up reimagining of SQLite — a single-file engine with a real B-tree, write-ahead log, transactions, JOINs, aggregates, vector search, full-text search, and bindings for six languages. Built to teach what databases actually do.
No mocks, no shortcuts. SQLRite implements the parts of SQLite that matter — a paged file format, a B-tree, a WAL, locks, JOINs, aggregates — and extends them with the parts AI workloads need: vector search, full-text search, hybrid retrieval, and an MCP adapter.
One .sqlrite file per database. 4 KiB pages. Magic header, format version, schema-root pointer. Currently file format v5.
Real cell-based pages with a slot directory. Interior + leaf nodes. Overflow chains for oversized rows.
32-byte header, 4112-byte frames, rolling-sum checksums. Torn-write recovery and atomic commits.
BEGIN / COMMIT / ROLLBACK with snapshot isolation. Auto-rollback if the commit's disk write fails.
Four JOIN flavors with explicit ON. GROUP BY + COUNT/SUM/AVG/MIN/MAX, DISTINCT, LIKE, IN, IS NULL.
? placeholders bind anywhere a literal is allowed — including vector arguments to k-NN. Per-connection LRU plan cache.
VECTOR(N) column type with cosine / dot / L2 distance. HNSW index per metric for sub-linear k-NN.
FTS5-style inverted index with BM25 scoring. fts_match() / bm25_score() functions, hybrid retrieval ready.
DROP TABLE / DROP INDEX / DROP COLUMN release pages onto a free-list. Auto-VACUUM compacts past 25%, tunable via PRAGMA.
Rust crate, Python (PyO3), Node.js (napi-rs), Go (database/sql), C FFI, and WASM for the browser.
Cross-platform Svelte 5 + Tauri 2.0 client. sqlrite-mcp exposes the database as an MCP stdio server.
Every phase is shippable on its own and documented. The codebase is the textbook.
SQLRite mirrors SQLite’s classic two-half split: a front end that turns SQL into a plan, and a back end that turns that plan into bytes.
Every phase is independently usable and merges to main before the next starts. Ten phases shipped through v0.9.1; the remaining list at the bottom is small, well-scoped follow-ups.
Rust 2024 edition, resolver 3, every dependency on current majors.
CREATE / INSERT / SELECT / UPDATE / DELETE with WHERE, ORDER BY, LIMIT.
Single-file .sqlrite format with a 4 KiB page layout and crash-safe header writes.
Cell-based pages, interior + leaf nodes, overflow chains, secondary indexes.
Cross-platform GUI in Svelte 5; engine extracted into a reusable library.
WAL, checkpointer, exclusive/shared locks, and real BEGIN/COMMIT/ROLLBACK.
Public Connection / Statement / Rows API plus Rust, Python, Node, Go, C FFI, and WASM bindings.
Lockstep versioning across eleven manifests; OIDC trusted publishing across PyPI, npm, crates.io.
Vector / embedding column type, HNSW ANN index, JSON column type, ask() (natural-language → SQL), MCP server.
FTS5-style inverted index, BM25 scoring, hybrid (lexical + semantic) retrieval. Bumps file format v4 → v5.
After v0.2.0 closed the file-format bump, the next nine sub-phases shipped the SQL surface that had been parked under "possible extras" — JOINs, aggregates, prepared statements, PRAGMA, and the storage hygiene around them.
Twelve-workload bench harness (SQLR-4 / SQLR-16) with a pluggable Driver trait. Pinned-host runs published.
Smaller, well-scoped follow-ups that slot in where they make sense — see the canonical roadmap doc for the full list.
The same Rust core — wrapped, never reimplemented. SDKs ship as prebuilt binaries so there’s no toolchain to install just to use the database.
The supported SQL is real — every feature lands with type checks, UNIQUE enforcement, and a clean error path instead of a panic. JOINs, aggregates, and prepared statements all came in the v0.2.0 → v0.9.1 wave.
| Statement | Features |
|---|---|
| CREATE TABLE | PRIMARY KEYUNIQUENOT NULLDEFAULT <literal>INTEGERTEXTREALBOOLEANVECTOR(N)JSONauto-index |
| ALTER TABLE | RENAME TORENAME COLUMNADD COLUMN [+ DEFAULT backfill]DROP COLUMNIF EXISTS |
| DROP TABLE / DROP INDEX | IF EXISTSauto-indexes refusedfreelist reclaimauto-VACUUM |
| CREATE [UNIQUE] INDEX | IF NOT EXISTSB-tree (default)USING HNSW WITH (metric = …)USING FTSINTEGER + TEXT + VECTOR |
| INSERT INTO | explicit column listauto-ROWIDmulti-row VALUESUNIQUE enforcementJSON validationvector dim checkDEFAULT padding |
| SELECT | projectionWHEREORDER BY ASC|DESCLIMIT nDISTINCTGROUP BY= literal → index probevec_distance_* / k-NN (HNSW)fts_match / bm25_score |
| JOINs | INNER JOINLEFT OUTERRIGHT OUTERFULL OUTERON <expr>aliasesself-joinsmulti-join chains |
| Predicates | = <> < <= > >=IS NULL / IS NOT NULLLIKE / NOT LIKE / ILIKEIN (literal-list)AND / OR / NOTarithmetic + || |
| Aggregates | COUNT(*)COUNT(DISTINCT col)SUMAVGMINMAXGROUP BY <col> |
| UPDATE | multi-column SETWHEREarithmetic in SETUNIQUE + type checksFTS / HNSW index maintenance |
| DELETE | WHEREfull-tablefreelist reclaim |
| BEGIN / COMMIT / ROLLBACK | snapshot transactionsWAL-backed commitauto-rollback on disk error |
| VACUUM | manual compactionauto-VACUUM (25% freelist)tunable thresholdSQL via PRAGMA |
| PRAGMA | auto_vacuum (read/write)extensible dispatchertyped errors on bad values |
| Prepared statements | ? placeholdersexecute_with_paramsquery_with_paramsper-conn LRU plan cacheValue::Vector binding |
| Functions | json_extract / json_typejson_array_length / json_object_keysvec_distance_l2 / cosine / dotfts_matchbm25_score |
Twelve workloads against SQLite (WAL+NORMAL) and DuckDB on a pinned-host run. The point isn’t to win — SQLite has 25 years of optimization behind it — it’s to baseline future engine work, prove the differentiator workloads deliver, and ground the roadmap with evidence.
Cosine distance, 10k × 384-dim corpus. Same engine, same data — only the index probe changes. Lower is better.
After SQLR-23 (prepared statements + ? bindings) closed the parser tax, SQLRite tracks SQLite within ~2× on hot read paths. Lower is better.
1k-doc tech-blurb corpus. SQLite FTS5 is the lexical comparator; the hybrid query (BM25 + cosine) is SQLRite-only. Lower is better.
Columnar engines win on big aggregations. We publish the gap honestly — SQLRite isn't competing on this axis, but the suite proves the differentiator workloads still deliver. Lower is better.
The suite ships with the gap measured + the workaround documented + the task linked. Each is “investigation, not a release gate.”
Cross-platform, dark by default, written in Svelte 5 + Tauri 2.0. The header has New… / Open… / Save As… buttons; the editor has live line numbers, ⌘/ comment toggle, and selection-aware Run.
| name | age |
|---|---|
| alice | 31 |
| cara | 29 |
| dan | 27 |
SQLRite is a learning project as much as a database. Each phase is paired with a long-form post on the design choices behind it.
What this whole series is about, and why rebuilding SQLite is the right way to learn it.
read on medium →PART 1From cargo new to a SQLite-style prompt with history and bracket matching.
read on medium →PART 2Hand-rolling the parser surface and the typed error path that replaces every panic.
read on medium →PART 3Why every embedded database leans on this one data structure.
read on medium →SQLRite is open source under MIT. Pull requests, issues, and database trivia all welcome.