v0.14.0 · MIT licensed · open source

SQLRite — an embedded SQL + vector database in Rust.

SQLRite is a from-scratch SQLite alternative — a single-file embedded database in Rust with a real B-tree, write-ahead log, transactions, JOINs, aggregates, HNSW vector search, BM25 full-text search, and bindings for six languages. Built to teach what databases actually do.

Phases 0–11 shipped — concurrent writes live· v0.14.0 on crates.io · PyPI · npm· Rust 2024 edition
sqlrite — repl — in-memory
$cargo install sqlrite-engine
try it · zero install

Run SQLRite in your browser.

The full engine compiled to WebAssembly — SQL editor, sample datasets, and HNSW vector search, all running in your tab. No server, no signup, nothing to install.

Open the playground
02 · features

An honest database engine, all the way down.

No mocks, no shortcuts. SQLRite implements the parts of SQLite that matter — a paged file format, a B-tree, a WAL, locks, JOINs, aggregates — and extends them with the parts AI workloads need: vector search, full-text search, hybrid retrieval, and an MCP adapter.

01 / 13

Single-file format

One .sqlrite file per database. 4 KiB pages. Magic header, format version, schema-root pointer. Currently file format v5.

pagerv5
02 / 13

On-disk B-Tree

Real cell-based pages with a slot directory. Interior + leaf nodes. Overflow chains for oversized rows.

btreeindexes
03 / 13

Write-Ahead Log

32-byte header, 4112-byte frames, rolling-sum checksums. Torn-write recovery and atomic commits.

walfsync
04 / 13

Real transactions

BEGIN / COMMIT / ROLLBACK with snapshot isolation. Auto-rollback if the commit's disk write fails.

acid
05 / 13

Concurrent writes · MVCC

BEGIN CONCURRENT for multi-writer apps. Snapshot-isolated reads, row-level conflict detection at commit, retryable Busy/BusySnapshot errors across every SDK.

mvccbegin concurrent
06 / 13

JOINs + aggregates

Four JOIN flavors with explicit ON. GROUP BY + COUNT/SUM/AVG/MIN/MAX, DISTINCT, LIKE, IN, IS NULL.

joinsgroup by
07 / 13

Prepared statements

? placeholders bind anywhere a literal is allowed — including vector arguments to k-NN. Per-connection LRU plan cache.

paramsplan cache
08 / 13

Vector search · HNSW

VECTOR(N) column type with cosine / dot / L2 distance. HNSW index per metric for sub-linear k-NN.

annrag
09 / 13

Full-text search · BM25

FTS5-style inverted index with BM25 scoring. fts_match() / bm25_score() functions, hybrid retrieval ready.

ftshybrid
10 / 13

Free-list + auto-VACUUM

DROP TABLE / DROP INDEX / DROP COLUMN release pages onto a free-list. Auto-VACUUM compacts past 25%, tunable via PRAGMA.

storagepragma
11 / 13

Six language SDKs

Rust crate, Python (PyO3), Node.js (napi-rs), Go (database/sql), C FFI, and WASM for the browser. All five non-Rust SDKs mint sibling connection handles for multi-writer workloads.

bindings
12 / 13

Tauri desktop GUI · MCP server

Cross-platform Svelte 5 + Tauri 2.0 client. sqlrite-mcp exposes the database as an MCP stdio server.

guiagents
13 / 13

Built to be read

Every phase is shippable on its own and documented. The codebase is the textbook.

pedagogy
03 · architecture

From SQL string to disk page in seven layers.

SQLRite mirrors SQLite’s classic two-half split: a front end that turns SQL into a plan, and a back end that turns that plan into bytes.

Input
REPL · SDK · FFI · WASM · MCP
SQL text + bindings
Front end
Tokenizer / Parser
sqlparser · SQLite dialect
Planner / Optimizer
index probes · HNSW · FTS
Executor
SQLRite VM — row iteration · expressions · UNIQUE + type checks · vector / FTS scoring
core
Back end
B-Tree · HNSW · FTS posting
interior + leaf, overflow chains
Pager
snapshot + staging diff
Durability
WAL · <db>.sqlrite-wal
framed, checksummed, recoverable
OS file lock
flock SH/EX
Storage
Single .sqlrite file
4 KiB pages · header · cells · slot dir
04 · roadmap

Phased, shippable, public.

Every phase is independently usable and merges to main before the next starts. Eleven phases shipped through v0.10.0; the remaining list at the bottom is small, well-scoped follow-ups.

Phase 0Modernizationshipped

Rust 2024 edition, resolver 3, every dependency on current majors.

  • rustyline 18 · clap 4 · sqlparser 0.61
  • thiserror 2 · env_logger 0.11
Phase 1SQL execution surfaceshipped

CREATE / INSERT / SELECT / UPDATE / DELETE with WHERE, ORDER BY, LIMIT.

  • Expressions: comparisons, AND/OR/NOT, arithmetic, ||
  • Auto-ROWID, UNIQUE enforcement, type checks
  • REPL with history, syntax highlighting, bracket matching
Phase 2On-disk persistenceshipped

Single-file .sqlrite format with a 4 KiB page layout and crash-safe header writes.

  • Typed payload pages chained via next-pointers
  • .open / .save / .tables meta commands
Phase 3On-disk B-Tree + auto-save pagershipped

Cell-based pages, interior + leaf nodes, overflow chains, secondary indexes.

  • Auto-save on every committing statement
  • sqlrite_master is itself a real cell-based table
  • Auto-indexes for PK + UNIQUE; CREATE [UNIQUE] INDEX
Phase 2.5Tauri 2.0 desktop appshipped

Cross-platform GUI in Svelte 5; engine extracted into a reusable library.

  • Engine became Send + Sync (Arc<Mutex<_>>)
  • Workspace: root + desktop/src-tauri
  • Three-pane dark UI with sticky-header result grid
Phase 4Durability and concurrencyshipped

WAL, checkpointer, exclusive/shared locks, and real BEGIN/COMMIT/ROLLBACK.

  • 4a–4e: file lock, WAL codec, WAL-aware pager, checkpointer, multi-reader/single-writer
  • 4f: BEGIN / COMMIT / ROLLBACK with snapshot isolation
  • Torn-write recovery via rolling-sum checksum frames
  • Auto-checkpoint past 100 frames, idempotent
Phase 5Embedding surface — six SDKsshipped

Public Connection / Statement / Rows API plus Rust, Python, Node, Go, C FFI, and WASM bindings.

  • 5a: Rust public API (param binding shipped in 9g)
  • 5b–5e: C FFI · Python (PyO3) · Node (napi-rs) · Go (database/sql)
  • 5g: WASM — ~1.8 MB / ~500 KB gzipped, browser-native
Phase 6Release engineering + CI/CDshipped

Lockstep versioning across eleven manifests; OIDC trusted publishing across PyPI, npm, crates.io.

  • 6a: bump-version.sh — one dispatch, eleven manifests
  • 6b: parallel CI across three OSes
  • 6c–6i: trusted publishers + desktop installers for 7 platform/format combos
Phase 7AI-era extensionsshipped

Vector / embedding column type, HNSW ANN index, JSON column type, ask() (natural-language → SQL), MCP server.

  • 7a–7d: VECTOR(N) + cosine/dot/L2 + HNSW + persistence
  • 7e: JSON column type + json_extract / json_type / json_array_length
  • 7g: ask() across REPL, desktop, Python, Node, Go, WASM, MCP
  • 7h: sqlrite-mcp — JSON-RPC over stdio, eight tools
Phase 8Full-text search + hybrid retrievalshipped

FTS5-style inverted index, BM25 scoring, hybrid (lexical + semantic) retrieval. Bumps file format v4 → v5.

  • 8a: tokenizer + BM25 + posting-list (pure algorithms)
  • 8b: fts_match() / bm25_score() + try_fts_probe optimizer hook
  • 8c: cell-encoded posting persistence + on-demand format bump
  • 8d–8e: hybrid retrieval worked example + bm25_search MCP tool
Phase 9SQL surface + DX follow-ups (v0.2.0 → v0.9.1)shipped

After v0.2.0 closed the file-format bump, the next nine sub-phases shipped the SQL surface that had been parked under "possible extras" — JOINs, aggregates, prepared statements, PRAGMA, and the storage hygiene around them.

  • 9a (v0.3.0): DEFAULT clause + DROP TABLE/INDEX + ALTER TABLE
  • 9b–9c (v0.4.0–v0.5.0): free-list, manual VACUUM, auto-VACUUM
  • 9d (v0.5.1): IS NULL / IS NOT NULL + Option<Value> INSERT pipeline
  • 9e (v0.6.0): GROUP BY + COUNT/SUM/AVG/MIN/MAX + DISTINCT + LIKE + IN
  • 9f (v0.7.0): JOINs — INNER, LEFT, RIGHT, FULL OUTER
  • 9g (v0.9.0): prepared statements + ? param binding + plan cache
  • 9h (v0.9.0): HNSW probe widened to cosine + dot via WITH (metric = …)
  • 9i (v0.9.1): PRAGMA dispatcher + auto_vacuum knob
Phase 10Benchmarks vs SQLite + DuckDBshipped

Twelve-workload bench harness (SQLR-4 / SQLR-16) with a pluggable Driver trait. Pinned-host runs published.

  • Read-by-PK · transactional CRUD · analytical slices · vector + FTS retrieval
  • Bundled SQLite + DuckDB drivers; criterion-based
  • Excluded from CI — `make bench` runs locally
Phase 11Concurrent writes via MVCC (v0.10.0)shipped

BEGIN CONCURRENT for multi-writer apps. Hekaton-style optimistic MVCC, row-level conflict detection at commit, snapshot-isolated reads, and a typed retry surface across every SDK.

  • 11.1–11.3: multi-connection foundation · logical clock · MvStore + PRAGMA journal_mode
  • 11.4–11.6: BEGIN CONCURRENT writes + commit-time validation · snapshot reads · per-commit GC
  • 11.7–11.8: SDK propagation of Busy/BusySnapshot · sibling Connection handles across C / Python / Node
  • 11.9: WAL log-record durability + crash recovery (format v3) — MVCC frames share the legacy save's fsync
  • 11.11a: REPL .spawn / .use / .conns for interactive multi-handle demos
  • 11.11b: new W13 bench workload pitting SQLRite-MVCC against SQLite
  • 11.11c: Go SDK cross-pool sibling shape via a process-level path registry
  • 11.12: canonical docs/concurrent-writes.md + worked example
05 · embedding

One engine. Six languages.

The same Rust core — wrapped, never reimplemented. SDKs ship as prebuilt binaries so there’s no toolchain to install just to use the database.

Rust

Native — no FFI hop. Imported as `use sqlrite::…`.

$cargo add sqlrite-engine
version0.14.0
registrycrates.io
licenseMIT
example.rs· copy-pasteable
use sqlrite::Connection;

fn main() -> sqlrite::Result<()> {
    let mut conn = Connection::open("app.sqlrite")?;
    conn.execute(
        "CREATE TABLE IF NOT EXISTS users \
         (id INTEGER PRIMARY KEY, name TEXT NOT NULL)",
    )?;
    conn.execute("INSERT INTO users (name) VALUES ('alice')")?;

    let rows = conn.query("SELECT id, name FROM users")?;
    for row in rows {
        let id: i64 = row.get(0)?;
        let name: String = row.get(1)?;
        println!("{id}: {name}");
    }
    Ok(())
}
06 · sql surface

What SQLRite speaks today.

The supported SQL is real — every feature lands with type checks, UNIQUE enforcement, and a clean error path instead of a panic. JOINs, aggregates, and prepared statements all came in the v0.2.0 → v0.9.1 wave.

StatementFeatures
CREATE TABLEPRIMARY KEYUNIQUENOT NULLDEFAULT <literal>INTEGERTEXTREALBOOLEANVECTOR(N)JSONauto-index
ALTER TABLERENAME TORENAME COLUMNADD COLUMN [+ DEFAULT backfill]DROP COLUMNIF EXISTS
DROP TABLE / DROP INDEXIF EXISTSauto-indexes refusedfreelist reclaimauto-VACUUM
CREATE [UNIQUE] INDEXIF NOT EXISTSB-tree (default)USING HNSW WITH (metric = …)USING FTSINTEGER + TEXT + VECTOR
INSERT INTOexplicit column listauto-ROWIDmulti-row VALUESUNIQUE enforcementJSON validationvector dim checkDEFAULT padding
SELECTprojectionWHEREORDER BY ASC|DESCLIMIT nDISTINCTGROUP BY= literal → index probevec_distance_* / k-NN (HNSW)fts_match / bm25_score
JOINsINNER JOINLEFT OUTERRIGHT OUTERFULL OUTERON <expr>aliasesself-joinsmulti-join chains
Predicates= <> < <= > >=IS NULL / IS NOT NULLLIKE / NOT LIKE / ILIKEIN (literal-list)AND / OR / NOTarithmetic + ||
AggregatesCOUNT(*)COUNT(DISTINCT col)SUMAVGMINMAXGROUP BY <col>HAVING
UPDATEmulti-column SETWHEREarithmetic in SETUNIQUE + type checksFTS / HNSW index maintenance
DELETEWHEREfull-tablefreelist reclaim
BEGIN / COMMIT / ROLLBACKsnapshot transactionsWAL-backed commitauto-rollback on disk error
VACUUMmanual compactionauto-VACUUM (25% freelist)tunable thresholdSQL via PRAGMA
PRAGMAauto_vacuum (read/write)extensible dispatchertyped errors on bad values
Prepared statements? placeholdersexecute_with_paramsquery_with_paramsper-conn LRU plan cacheValue::Vector binding
Functionsjson_extract / json_typejson_array_length / json_object_keysvec_distance_l2 / cosine / dotfts_matchbm25_score
not yet supported
subqueriesCTEs (WITH)HAVING without GROUP BYCASE WHENBETWEENGLOB / REGEXPOFFSETmulti-column ORDER BYUNION / INTERSECT / EXCEPTINSERT … SELECTCREATE VIEW / TRIGGERFOREIGN KEY / CHECKsavepointsnamed placeholders (:foo, $1)
07 · benchmarks

SQLRite vs SQLite benchmarks — honest numbers, published in public.

Twelve workloads pitch SQLRite, the embedded Rust database, against SQLite (WAL+NORMAL) and DuckDB on a pinned-host run. The point isn’t to win — SQLite has 25 years of optimization behind it — it’s to baseline future engine work, prove the differentiator workloads (HNSW vector search, BM25 full-text) deliver, and ground the roadmap with evidence.

~50×
HNSW vs brute-force k-NN
10k × 384-dim vectors, cosine top-10 — 120.88 ms → 2.40 ms.
1.6–1.9×
Read-path gap vs SQLite
Within ~2× of SQLite (WAL+NORMAL) on W1 read-by-PK and W6 index lookup after SQLR-23.
608 µs
Hybrid retrieval at 1k docs
0.5 × bm25_score + 0.5 × (1 − vec_distance_cosine).

Vector top-10 · the HNSW win

Cosine distance, 10k × 384-dim corpus. Same engine, same data — only the index probe changes. Lower is better.

W10 · cosine top-10 · 10k × 384-dim~50× faster — HNSW probe vs brute-force scan
Brute-force scan
120.88 ms
HNSW (M=16, ef_search=50)
2.40 ms

Read paths · OLTP baseline

After SQLR-23 (prepared statements + ? bindings) closed the parser tax, SQLRite tracks SQLite within ~2× on hot read paths. Lower is better.

W1 · read-by-PK · 10k probes1.9× — was 4.8× pre-SQLR-23
SQLite (WAL+NORMAL)
2.09 µs
SQLRite
3.92 µs
W6 · secondary index · 10k probes1.6× — was 4.2× pre-SQLR-23
SQLite (WAL+NORMAL)
2.56 µs
SQLRite
4.04 µs

Full-text · BM25 + hybrid retrieval

1k-doc tech-blurb corpus. SQLite FTS5 is the lexical comparator; the hybrid query (BM25 + cosine) is SQLRite-only. Lower is better.

W11 · BM25 top-1021× behind SQLite FTS5 — was 43× pre-SQLR-23
SQLite FTS5
23.65 µs
SQLRite
501.63 µs
W12 · Hybrid (BM25 + cosine fusion)RAG-shaped query, no SQL-engine comparator
SQLRite
607.90 µs

Analytical aggregates · DuckDB's home turf

Columnar engines win on big aggregations. We publish the gap honestly — SQLRite isn't competing on this axis, but the suite proves the differentiator workloads still deliver. Lower is better.

W7 · SUM(v) · 1M rows
DuckDB
478.78 µs
SQLite (WAL+NORMAL)
31.57 ms
SQLRite
103.62 ms
W8 · GROUP BY · cardinality 10
DuckDB
949.75 µs
SQLite (WAL+NORMAL)
366.52 ms
SQLRite
197.32 ms
engineering debts the bench surfaced

The suite ships with the gap measured + the workaround documented + the task linked. Each is “investigation, not a release gate.”

  • SQLR-18W4 single-row INSERT~579× vs SQLiteBottom-up B-tree rebuild on every COMMIT.
  • SQLR-19W8 GROUP BY · card-100kSkipped by default — ~245 s/iterVec-backed group store; should be HashMap.
  • SQLR-20W9 INNER JOIN · 10k×10k~14M× vs SQLiteNested-loop driver doesn't push ON predicate to the inner-side index.
  • SQLR-21W11 / W12 corpus capFTS doc-lengths sidecar capped at ~1,360 docsPhase 8.1 — overflow chaining for posting + sidecar cells.
Run: 2026-05-08 · Apple M1 Pro · macOS · aarch64 · commit ac84d560
full headline table →raw JSON envelope →
08 · desktop

A native client for browsing your data.

Cross-platform, dark by default, written in Svelte 5 + Tauri 2.0. The header has New… / Open… / Save As… buttons; the editor has live line numbers, ⌘/ comment toggle, and selection-aware Run.

SQLRite — app.sqlrite
1-- top earners
2SELECT name, age FROM users
3WHERE age > 25
4ORDER BY age DESC LIMIT 5;
✓ 3 rows · 1.2ms
nameage
alice31
cara29
dan27
macOS · .dmgWindows · .msiLinux · .AppImage / .deb / .rpm
09 · written in public

Read the blog.

SQLRite is a learning project as much as a database. Each phase is paired with a long-form post on the design choices behind it.

10 · faq

Frequently asked questions.

The short answers. The long ones live in the docs and the repository.

What is SQLRite?

SQLRite is an embedded SQL + vector database written from scratch in Rust. Like SQLite, it stores a whole database in a single file and runs inside your process — no server. On top of the classic engine (B-tree storage, WAL, transactions, JOINs, aggregates) it adds the parts AI workloads need: HNSW vector search, BM25 full-text search, JSON columns, and a built-in MCP server.

How is SQLRite different from SQLite?

SQLite is a battle-tested C library; SQLRite is an independent Rust engine with a SQLite-style design, built to be read and extended. SQLRite ships vector search (VECTOR columns + HNSW indexes), BM25 full-text search, MVCC concurrent writes via BEGIN CONCURRENT, natural-language-to-SQL, and an MCP server as first-class features instead of extensions. SQLite remains the right default for maximum maturity; SQLRite targets embedded AI / retrieval workloads and learning how databases work.

What languages can I use SQLRite from?

Six surfaces wrap the same engine: a native Rust crate (sqlrite-engine), Python (PyO3), Node.js (napi-rs), Go (database/sql driver), a C FFI with a generated sqlrite.h header, and a WebAssembly build for the browser. There's also a CLI REPL and a Tauri desktop GUI.

Can SQLRite run in the browser?

Yes — the engine compiles to a ~500 KB-gzipped WebAssembly module with web, bundler, and nodejs targets. The playground on this site runs the full engine in your tab, including HNSW vector search, with OPFS persistence.

What is the SQLRite MCP server?

sqlrite-mcp exposes any .sqlrite database to AI agents over the Model Context Protocol (stdio). It ships eight tools — list_tables, describe_table, query, execute, schema_dump, vector_search, bm25_search, and ask — and a --read-only mode that opens the file under a shared lock and hides execute.

Is SQLRite production-ready?

SQLRite is a young, well-tested engine built in public — honest about what it is. It has WAL crash safety, typed errors, snapshot-isolated transactions, and a benchmark suite against SQLite and DuckDB, but it hasn't had SQLite's decades of hardening. Use it for embedded AI prototypes, tools, and learning; reach for SQLite when you need maximum maturity.

join in

What I cannot create, I do not understand.

SQLRite is open source under MIT. Pull requests, issues, and database trivia all welcome.