v0.9.1 · MIT licensed · open source

An embedded SQL database, built from scratch in Rust.

SQLRite is a from-the-ground-up reimagining of SQLite — a single-file engine with a real B-tree, write-ahead log, transactions, JOINs, aggregates, vector search, full-text search, and bindings for six languages. Built to teach what databases actually do.

Phases 0–10 shipped· v0.9.1 on crates.io · PyPI · npm· Rust 2024 edition
sqlrite — repl — in-memory
$cargo install sqlrite-engine
02 · features

An honest database engine, all the way down.

No mocks, no shortcuts. SQLRite implements the parts of SQLite that matter — a paged file format, a B-tree, a WAL, locks, JOINs, aggregates — and extends them with the parts AI workloads need: vector search, full-text search, hybrid retrieval, and an MCP adapter.

01 / 12

Single-file format

One .sqlrite file per database. 4 KiB pages. Magic header, format version, schema-root pointer. Currently file format v5.

pagerv5
02 / 12

On-disk B-Tree

Real cell-based pages with a slot directory. Interior + leaf nodes. Overflow chains for oversized rows.

btreeindexes
03 / 12

Write-Ahead Log

32-byte header, 4112-byte frames, rolling-sum checksums. Torn-write recovery and atomic commits.

walfsync
04 / 12

Real transactions

BEGIN / COMMIT / ROLLBACK with snapshot isolation. Auto-rollback if the commit's disk write fails.

acid
05 / 12

JOINs + aggregates

Four JOIN flavors with explicit ON. GROUP BY + COUNT/SUM/AVG/MIN/MAX, DISTINCT, LIKE, IN, IS NULL.

joinsgroup by
06 / 12

Prepared statements

? placeholders bind anywhere a literal is allowed — including vector arguments to k-NN. Per-connection LRU plan cache.

paramsplan cache
07 / 12

Vector search · HNSW

VECTOR(N) column type with cosine / dot / L2 distance. HNSW index per metric for sub-linear k-NN.

annrag
08 / 12

Full-text search · BM25

FTS5-style inverted index with BM25 scoring. fts_match() / bm25_score() functions, hybrid retrieval ready.

ftshybrid
09 / 12

Free-list + auto-VACUUM

DROP TABLE / DROP INDEX / DROP COLUMN release pages onto a free-list. Auto-VACUUM compacts past 25%, tunable via PRAGMA.

storagepragma
10 / 12

Six language SDKs

Rust crate, Python (PyO3), Node.js (napi-rs), Go (database/sql), C FFI, and WASM for the browser.

bindings
11 / 12

Tauri desktop GUI · MCP server

Cross-platform Svelte 5 + Tauri 2.0 client. sqlrite-mcp exposes the database as an MCP stdio server.

guiagents
12 / 12

Built to be read

Every phase is shippable on its own and documented. The codebase is the textbook.

pedagogy
03 · architecture

From SQL string to disk page in seven layers.

SQLRite mirrors SQLite’s classic two-half split: a front end that turns SQL into a plan, and a back end that turns that plan into bytes.

Input
REPL · SDK · FFI · WASM · MCP
SQL text + bindings
Front end
Tokenizer / Parser
sqlparser · SQLite dialect
Planner / Optimizer
index probes · HNSW · FTS
Executor
SQLRite VM — row iteration · expressions · UNIQUE + type checks · vector / FTS scoring
core
Back end
B-Tree · HNSW · FTS posting
interior + leaf, overflow chains
Pager
snapshot + staging diff
Durability
WAL · <db>.sqlrite-wal
framed, checksummed, recoverable
OS file lock
flock SH/EX
Storage
Single .sqlrite file
4 KiB pages · header · cells · slot dir
04 · roadmap

Phased, shippable, public.

Every phase is independently usable and merges to main before the next starts. Ten phases shipped through v0.9.1; the remaining list at the bottom is small, well-scoped follow-ups.

Phase 0Modernizationshipped

Rust 2024 edition, resolver 3, every dependency on current majors.

  • rustyline 18 · clap 4 · sqlparser 0.61
  • thiserror 2 · env_logger 0.11
Phase 1SQL execution surfaceshipped

CREATE / INSERT / SELECT / UPDATE / DELETE with WHERE, ORDER BY, LIMIT.

  • Expressions: comparisons, AND/OR/NOT, arithmetic, ||
  • Auto-ROWID, UNIQUE enforcement, type checks
  • REPL with history, syntax highlighting, bracket matching
Phase 2On-disk persistenceshipped

Single-file .sqlrite format with a 4 KiB page layout and crash-safe header writes.

  • Typed payload pages chained via next-pointers
  • .open / .save / .tables meta commands
Phase 3On-disk B-Tree + auto-save pagershipped

Cell-based pages, interior + leaf nodes, overflow chains, secondary indexes.

  • Auto-save on every committing statement
  • sqlrite_master is itself a real cell-based table
  • Auto-indexes for PK + UNIQUE; CREATE [UNIQUE] INDEX
Phase 2.5Tauri 2.0 desktop appshipped

Cross-platform GUI in Svelte 5; engine extracted into a reusable library.

  • Engine became Send + Sync (Arc<Mutex<_>>)
  • Workspace: root + desktop/src-tauri
  • Three-pane dark UI with sticky-header result grid
Phase 4Durability and concurrencyshipped

WAL, checkpointer, exclusive/shared locks, and real BEGIN/COMMIT/ROLLBACK.

  • 4a–4e: file lock, WAL codec, WAL-aware pager, checkpointer, multi-reader/single-writer
  • 4f: BEGIN / COMMIT / ROLLBACK with snapshot isolation
  • Torn-write recovery via rolling-sum checksum frames
  • Auto-checkpoint past 100 frames, idempotent
Phase 5Embedding surface — six SDKsshipped

Public Connection / Statement / Rows API plus Rust, Python, Node, Go, C FFI, and WASM bindings.

  • 5a: Rust public API (param binding shipped in 9g)
  • 5b–5e: C FFI · Python (PyO3) · Node (napi-rs) · Go (database/sql)
  • 5g: WASM — ~1.8 MB / ~500 KB gzipped, browser-native
Phase 6Release engineering + CI/CDshipped

Lockstep versioning across eleven manifests; OIDC trusted publishing across PyPI, npm, crates.io.

  • 6a: bump-version.sh — one dispatch, eleven manifests
  • 6b: parallel CI across three OSes
  • 6c–6i: trusted publishers + desktop installers for 7 platform/format combos
Phase 7AI-era extensionsshipped

Vector / embedding column type, HNSW ANN index, JSON column type, ask() (natural-language → SQL), MCP server.

  • 7a–7d: VECTOR(N) + cosine/dot/L2 + HNSW + persistence
  • 7e: JSON column type + json_extract / json_type / json_array_length
  • 7g: ask() across REPL, desktop, Python, Node, Go, WASM, MCP
  • 7h: sqlrite-mcp — JSON-RPC over stdio, eight tools
Phase 8Full-text search + hybrid retrievalshipped

FTS5-style inverted index, BM25 scoring, hybrid (lexical + semantic) retrieval. Bumps file format v4 → v5.

  • 8a: tokenizer + BM25 + posting-list (pure algorithms)
  • 8b: fts_match() / bm25_score() + try_fts_probe optimizer hook
  • 8c: cell-encoded posting persistence + on-demand format bump
  • 8d–8e: hybrid retrieval worked example + bm25_search MCP tool
Phase 9SQL surface + DX follow-ups (v0.2.0 → v0.9.1)shipped

After v0.2.0 closed the file-format bump, the next nine sub-phases shipped the SQL surface that had been parked under "possible extras" — JOINs, aggregates, prepared statements, PRAGMA, and the storage hygiene around them.

  • 9a (v0.3.0): DEFAULT clause + DROP TABLE/INDEX + ALTER TABLE
  • 9b–9c (v0.4.0–v0.5.0): free-list, manual VACUUM, auto-VACUUM
  • 9d (v0.5.1): IS NULL / IS NOT NULL + Option<Value> INSERT pipeline
  • 9e (v0.6.0): GROUP BY + COUNT/SUM/AVG/MIN/MAX + DISTINCT + LIKE + IN
  • 9f (v0.7.0): JOINs — INNER, LEFT, RIGHT, FULL OUTER
  • 9g (v0.9.0): prepared statements + ? param binding + plan cache
  • 9h (v0.9.0): HNSW probe widened to cosine + dot via WITH (metric = …)
  • 9i (v0.9.1): PRAGMA dispatcher + auto_vacuum knob
Phase 10Benchmarks vs SQLite + DuckDBshipped

Twelve-workload bench harness (SQLR-4 / SQLR-16) with a pluggable Driver trait. Pinned-host runs published.

  • Read-by-PK · transactional CRUD · analytical slices · vector + FTS retrieval
  • Bundled SQLite + DuckDB drivers; criterion-based
  • Excluded from CI — `make bench` runs locally
05 · embedding

One engine. Six languages.

The same Rust core — wrapped, never reimplemented. SDKs ship as prebuilt binaries so there’s no toolchain to install just to use the database.

Rust

Native — no FFI hop. Imported as `use sqlrite::…`.

$cargo add sqlrite-engine
version0.9.1
registrycrates.io
licenseMIT
example.rs· copy-pasteable
use sqlrite::Connection; fn main() -> sqlrite::Result<()> { let mut conn = Connection::open("app.sqlrite")?; conn.execute( "CREATE TABLE IF NOT EXISTS users \ (id INTEGER PRIMARY KEY, name TEXT NOT NULL)", )?; conn.execute("INSERT INTO users (name) VALUES ('alice')")?; let rows = conn.query("SELECT id, name FROM users")?; for row in rows { let id: i64 = row.get(0)?; let name: String = row.get(1)?; println!("{id}: {name}"); } Ok(()) }
06 · sql surface

What SQLRite speaks today.

The supported SQL is real — every feature lands with type checks, UNIQUE enforcement, and a clean error path instead of a panic. JOINs, aggregates, and prepared statements all came in the v0.2.0 → v0.9.1 wave.

StatementFeatures
CREATE TABLEPRIMARY KEYUNIQUENOT NULLDEFAULT <literal>INTEGERTEXTREALBOOLEANVECTOR(N)JSONauto-index
ALTER TABLERENAME TORENAME COLUMNADD COLUMN [+ DEFAULT backfill]DROP COLUMNIF EXISTS
DROP TABLE / DROP INDEXIF EXISTSauto-indexes refusedfreelist reclaimauto-VACUUM
CREATE [UNIQUE] INDEXIF NOT EXISTSB-tree (default)USING HNSW WITH (metric = …)USING FTSINTEGER + TEXT + VECTOR
INSERT INTOexplicit column listauto-ROWIDmulti-row VALUESUNIQUE enforcementJSON validationvector dim checkDEFAULT padding
SELECTprojectionWHEREORDER BY ASC|DESCLIMIT nDISTINCTGROUP BY= literal → index probevec_distance_* / k-NN (HNSW)fts_match / bm25_score
JOINsINNER JOINLEFT OUTERRIGHT OUTERFULL OUTERON <expr>aliasesself-joinsmulti-join chains
Predicates= <> < <= > >=IS NULL / IS NOT NULLLIKE / NOT LIKE / ILIKEIN (literal-list)AND / OR / NOTarithmetic + ||
AggregatesCOUNT(*)COUNT(DISTINCT col)SUMAVGMINMAXGROUP BY <col>
UPDATEmulti-column SETWHEREarithmetic in SETUNIQUE + type checksFTS / HNSW index maintenance
DELETEWHEREfull-tablefreelist reclaim
BEGIN / COMMIT / ROLLBACKsnapshot transactionsWAL-backed commitauto-rollback on disk error
VACUUMmanual compactionauto-VACUUM (25% freelist)tunable thresholdSQL via PRAGMA
PRAGMAauto_vacuum (read/write)extensible dispatchertyped errors on bad values
Prepared statements? placeholdersexecute_with_paramsquery_with_paramsper-conn LRU plan cacheValue::Vector binding
Functionsjson_extract / json_typejson_array_length / json_object_keysvec_distance_l2 / cosine / dotfts_matchbm25_score
not yet supported
subqueriesCTEs (WITH)HAVINGCASE WHENBETWEENGLOB / REGEXPOFFSETmulti-column ORDER BYUNION / INTERSECT / EXCEPTINSERT … SELECTGROUP BY / DISTINCT over JOINsCREATE VIEW / TRIGGERFOREIGN KEY / CHECKsavepointsnamed placeholders (:foo, $1)
07 · benchmarks

Honest numbers, published in public.

Twelve workloads against SQLite (WAL+NORMAL) and DuckDB on a pinned-host run. The point isn’t to win — SQLite has 25 years of optimization behind it — it’s to baseline future engine work, prove the differentiator workloads deliver, and ground the roadmap with evidence.

~50×
HNSW vs brute-force k-NN
10k × 384-dim vectors, cosine top-10 — 120.88 ms → 2.40 ms.
1.6–1.9×
Read-path gap vs SQLite
Within ~2× of SQLite (WAL+NORMAL) on W1 read-by-PK and W6 index lookup after SQLR-23.
608 µs
Hybrid retrieval at 1k docs
0.5 × bm25_score + 0.5 × (1 − vec_distance_cosine).

Vector top-10 · the HNSW win

Cosine distance, 10k × 384-dim corpus. Same engine, same data — only the index probe changes. Lower is better.

W10 · cosine top-10 · 10k × 384-dim~50× faster — HNSW probe vs brute-force scan
Brute-force scan
120.88 ms
HNSW (M=16, ef_search=50)
2.40 ms

Read paths · OLTP baseline

After SQLR-23 (prepared statements + ? bindings) closed the parser tax, SQLRite tracks SQLite within ~2× on hot read paths. Lower is better.

W1 · read-by-PK · 10k probes1.9× — was 4.8× pre-SQLR-23
SQLite (WAL+NORMAL)
2.09 µs
SQLRite
3.92 µs
W6 · secondary index · 10k probes1.6× — was 4.2× pre-SQLR-23
SQLite (WAL+NORMAL)
2.56 µs
SQLRite
4.04 µs

Full-text · BM25 + hybrid retrieval

1k-doc tech-blurb corpus. SQLite FTS5 is the lexical comparator; the hybrid query (BM25 + cosine) is SQLRite-only. Lower is better.

W11 · BM25 top-1021× behind SQLite FTS5 — was 43× pre-SQLR-23
SQLite FTS5
23.65 µs
SQLRite
501.63 µs
W12 · Hybrid (BM25 + cosine fusion)RAG-shaped query, no SQL-engine comparator
SQLRite
607.90 µs

Analytical aggregates · DuckDB's home turf

Columnar engines win on big aggregations. We publish the gap honestly — SQLRite isn't competing on this axis, but the suite proves the differentiator workloads still deliver. Lower is better.

W7 · SUM(v) · 1M rows
DuckDB
478.78 µs
SQLite (WAL+NORMAL)
31.57 ms
SQLRite
103.62 ms
W8 · GROUP BY · cardinality 10
DuckDB
949.75 µs
SQLite (WAL+NORMAL)
366.52 ms
SQLRite
197.32 ms
engineering debts the bench surfaced

The suite ships with the gap measured + the workaround documented + the task linked. Each is “investigation, not a release gate.”

  • SQLR-18W4 single-row INSERT~579× vs SQLiteBottom-up B-tree rebuild on every COMMIT.
  • SQLR-19W8 GROUP BY · card-100kSkipped by default — ~245 s/iterVec-backed group store; should be HashMap.
  • SQLR-20W9 INNER JOIN · 10k×10k~14M× vs SQLiteNested-loop driver doesn't push ON predicate to the inner-side index.
  • SQLR-21W11 / W12 corpus capFTS doc-lengths sidecar capped at ~1,360 docsPhase 8.1 — overflow chaining for posting + sidecar cells.
Run: 2026-05-08 · Apple M1 Pro · macOS · aarch64 · commit ac84d560
full headline table →raw JSON envelope →
08 · desktop

A native client for browsing your data.

Cross-platform, dark by default, written in Svelte 5 + Tauri 2.0. The header has New… / Open… / Save As… buttons; the editor has live line numbers, ⌘/ comment toggle, and selection-aware Run.

SQLRite — app.sqlrite
1-- top earners
2SELECT name, age FROM users
3WHERE age > 25
4ORDER BY age DESC LIMIT 5;
✓ 3 rows · 1.2ms
nameage
alice31
cara29
dan27
macOS · .dmgWindows · .msiLinux · .AppImage / .deb / .rpm
09 · written in public

Read the series.

SQLRite is a learning project as much as a database. Each phase is paired with a long-form post on the design choices behind it.

join in

What I cannot create, I do not understand.

SQLRite is open source under MIT. Pull requests, issues, and database trivia all welcome.