CuttleDB
v0.7.0 · released 2026-05-28

An embedded realtime database with vector search, WAL durability, and event streaming.

One self-contained binary. Five-mode retrieval. ACID transactions. Real-time push. Zero external runtime dependencies.

Apache-2.0 Python + JavaScript SDKs Linux · macOS · Windows sigstore-signed

01Install

Three paths, depending on how you want to consume it.

Python
pip install cuttledb
JavaScript · ESM
npm install cuttledb
Server binary
github.com/.../releases/latest

Pre-built binaries for Linux x64, macOS arm64, and Windows x64 are attached to every release. All are sigstore-signed via the cosign keyless flow; verification recipe is in SECURITY.md.

02What it is

A line-based wire protocol over TCP or WebSocket. One dispatcher. Five retrieval verbs in addition to the usual SELECT / INSERT / UPDATE / DELETE.

KNN
k-nearest-neighbor vector search. AVX2+FMA cosine on small tables, HNSW above ~2K rows. recall@10 = 1.0 at 100K×128, 12.7× faster than the brute-force baseline.
LSEARCH
BM25 lexical scoring over a STRING column with an inverted index.
SEARCH
Reciprocal Rank Fusion of KNN and LSEARCH. One server-side call returns the fused top-k; no application-side rank-merging code.
BSEARCH
Boolean DSL that composes predicates, KNN scoring atoms, BM25 scoring atoms, and AND/OR/parens. One roundtrip, one ranked answer.
KNN ... WHERE
Predicate-filtered KNN: filter first, vector-search second, all server-side.

And in the same binary:

03Quickstart

Start the server, then talk to it from Python:

$ cuttledb --port 7878
import cuttledb

db = cuttledb.connect(port=7878)
db.create_table("notes", "id INT, text STRING, vec VEC(128)")
db.insert("notes", [(1, "hello world", [0.1] * 128)])

rows = db.knn("notes", "vec", query=[0.1] * 128, k=5)
print(rows)

Same thing in JavaScript:

import { connect } from "cuttledb";

const db = await connect({ port: 7878 });
await db.createTable("notes", "id INT, text STRING, vec VEC(128)");
await db.insert("notes", [[1, "hello world", new Array(128).fill(0.1)]]);

const rows = await db.knn("notes", "vec", { query: new Array(128).fill(0.1), k: 5 });
console.log(rows);

04Showcases

Three things you would otherwise compose from several systems.

01

Agent memory in one binary

A long-running agent needs to remember past observations, recall the semantically nearest ones, and be notified when new ones arrive. One table and three verbs.

db.create_table("memory", "id INT, text STRING, embed VEC(384), ts INT")

# Store something the agent learned.
db.insert("memory", [(42, "user prefers terse answers", embed_terse, 1716831234)])

# Recall the 5 nearest memories to a new query.
recent = db.knn("memory", "embed", query=embed_q, k=5)

# Subscribe to new memories from other workers.
for evt in db.subscribe("memory"):
    update_local_cache(evt)
02

Real-time UI without a separate pub/sub

A dashboard wants to render new rows as they land — without polling and without Redis sitting next to your database. SUB over WebSocket delivers row-level events from the same store you queried.

const db = await connect({ url: "ws://localhost:7878" });
const stream = await db.subscribe("orders");
for await (const evt of stream) {
    // evt = { table: "orders", op: "INSERT", row: { id, item, qty } }
    appendRow(evt.row);
}
03

Hybrid retrieval in one roundtrip

Vector recall alone misses exact-term matches. BM25 alone misses paraphrases. RRF fuses both, server-side, in one call.

rows = db.search(
    table="docs",
    vec_col="embed", vec_query=embed_q,
    text_col="body", text_query="reciprocal rank fusion",
    k=10,
)

The more expressive case — Boolean over filters and scoring atoms:

rows = db.bsearch(
    "docs",
    "(category = 'paper' AND year >= 2020) "
    "AND (KNN(embed, $1, 50) OR BM25(body, $2, 50))",
    bindings=[embed_q, "rank fusion"],
    k=10,
)

05Benchmarks

From bench/RESULTS.md. CuttleDB is a network server; SQLite is in-process.

honest caveat We pay for TCP; SQLite doesn't. SQLite wins bulk INSERT 8.4×. The CuttleDB win is in read-path aggregates and in vector primitives that SQLite needs extensions for.

1K-row aggregates · CuttleDB over TCP vs SQLite :memory:

operationwinnerratio
SUM CuttleDB1.8×
COUNT CuttleDB1.6×
MIN CuttleDB1.5×
SELECT WHERE CuttleDB1.4×
bulk INSERT SQLite8.4× (TCP overhead)

Vector KNN · 100K rows × 128 dim

methodthroughputrecall@10
AVX2+FMA brute forcebaseline1.0
HNSW index 12.7× baseline1.0

Top-10 brute force over 10K vectors: 2 ms. See bench/HNSW_BENCH.md for the full HNSW sweep.

06Verify a release

sigstore Releases are cosign-keyless signed. The .cosign.bundle file carries the signature, the signing certificate, and the Rekor transparency-log inclusion proof. No long-lived key for an attacker to steal.

Verify with the cosign CLI:

cosign verify-blob \
  --bundle cuttledb-linux-x64.cosign.bundle \
  --certificate-identity-regexp '.*' \
  --certificate-oidc-issuer-regexp '.*' \
  cuttledb-linux-x64

Expected output: Verified OK. Full recipe and identity pinning for production use is in SECURITY.md.

07FAQ

Is CuttleDB free and open source?

Yes. Apache-2.0. SDKs, docs, examples, benchmarks, and the wire-protocol specification are all in the public repository. The server binary is distributed for free use, including in production and commercial settings. The engine source is not published in the public repository.

How does it compare to SQLite?

SQLite is in-process and uses SQL. CuttleDB is a small network server using a Redis-style line protocol. CuttleDB has first-class vector search (HNSW), BM25, RRF hybrid retrieval, Boolean DSL, and real-time SUB/UNSUB built in — things SQLite requires extensions for. CuttleDB pays for TCP round-trips that SQLite avoids: SQLite wins bulk INSERT 8.4× for that reason, but CuttleDB wins read-path aggregates 1.4–1.8×.

How does it compare to Redis?

Redis is in-memory, key-value, and famous for pub/sub. CuttleDB is durable by default (WAL), table-shaped (columns, types, predicates), and adds vector search and hybrid retrieval in the same binary. SUB/UNSUB gives you the change-stream pattern without a second service.

How does it compare to Pinecone or Qdrant?

For self-hosted vector workloads up to ~10M vectors per node, CuttleDB is enough: HNSW with 12.7× speedup at 100K×128, AVX2 cosine, SUB/UNSUB push, in a single binary under 1 MB. Dedicated vector services remain more specialized for multi-region distributed indexes and exotic tuning. CuttleDB is Apache-2.0; no vendor lock-in.

How does it compare to Elasticsearch?

Elasticsearch is a JVM service with rich lexical search and a heavy operational footprint. CuttleDB ships BM25 + RRF + vector + Boolean DSL in a sub-megabyte native binary with no JVM and no cluster. Use Elasticsearch when you need its mature ecosystem (logging pipelines, Kibana, etc.); use CuttleDB when you want hybrid retrieval inside your own app or agent.

What languages can connect to it?

Python (pip install cuttledb) and JavaScript/TypeScript (npm install cuttledb, ESM only) ship as official adapters. The line-based wire protocol is small enough that any language can implement a client in a few hundred lines. See PROTOCOL.md.

How do I use it for AI agent memory?

Create a table with a STRING text column and a VEC embedding column. INSERT memories with their embeddings. Use KNN for semantic recall and SUB/UNSUB to be notified when other workers add memories. BSEARCH composes Boolean filters with scoring atoms when you need to constrain (e.g., "only memories from this session AND nearest to this query").

08Links