Skip to content

Under the Hood

Covalence turns text into vectors, stores them in SQLite, and combines vector similarity with keyword search to surface the memory your AI needs. Every step runs on your Mac. This page explains what’s in the pipeline, and the numbers we picked.

Covalence embeds every memory with nomic-embed-text-v1.5, an open-weights text encoder from Nomic. The model ships as fp16 safetensors, bundled inside the app — roughly 261 MB on disk, no download step, no account.

It runs through jkrukowski/swift-embeddings on MLTensor, Apple’s unified compute surface. Requires macOS 15 (Sequoia) or later. Scheduling across ANE, GPU, and CPU is handled by the OS — the code does not pin to the Neural Engine.

The raw model output is a 768-dimension vector. We don’t keep all of it. (Section 3 explains why.)

Memories shorter than 512 tokens embed as a single chunk. Longer memories split into overlapping chunks with 256 tokens of overlap; search returns the parent note, ranked by its best-matching chunk.

Nomic trained this model to encode queries and documents asymmetrically. A query embedding and a document embedding of the same text are not the same vector — which is the point. Cosine similarity then scores the retrieval relationship, not surface resemblance.

The mechanism is a pair of task-specific text prefixes, injected before encoding:

  • search_document: — prepended when a memory is stored.
  • search_query: — prepended when a search runs.

Same model, different prompt. Queries and documents land in different regions of vector space, and similarity scores reflect “does this document answer this question” rather than “do these two strings look alike”.

We could have trained a bi-encoder ourselves. Nomic already shipped one.

The model was trained with Matryoshka Representation Learning (2022): the first N dimensions of its output are themselves a valid embedding, just at lower fidelity. That means the 768-dim vector can be truncated without retraining and still retrieve usefully.

Covalence keeps the first 256 floats, then re-normalises with vDSP (vDSP_svesq + vDSP_vsdiv). One truncation step. Not a schedule.

The trade-off, concretely: about 2% quality loss against the full 768-dim vector, three times smaller on disk — 1 KB per memory instead of 3. For a memory store that grows with every conversation, the factor of three matters more than the two percent.

No schedule. No rebuild step. Just the first 256 floats and a normalisation.

Covalence retrieval pipeline A query flows through the embedding encoder and Matryoshka truncation to 256 dimensions, then fans out to vector search and keyword search in parallel; their ranked lists merge via reciprocal rank fusion, recency weighting is applied, and the final ranked results are returned. Query Embedding encoder Matryoshka truncation: 768 → 256 Vector search (vec0 KNN) FTS5 keyword search (BM25) RRF merge (k = 60) Recency weighting (10%)

Two independent queries run on every search, and their ranked lists are merged.

  • Vector search is a vec0 K-nearest-neighbours query over the memories_vec virtual table from sqlite-vec. It returns the closest 256-dim vectors by cosine similarity.
  • Keyword search is an FTS5 BM25 query over memories_fts. Same database, different index.

Each branch fetches max(limit × 4, 20) candidates — and ×6 when any filter (tags, source, core) is active, because filters run after fetch in Swift, not in SQL. The branch has to over-fetch to leave enough survivors. Vector candidates also pass a pre-merge cosine threshold (default 0.3, configurable) so noise never reaches the merge step.

The merge itself is Reciprocal Rank Fusion — Cormack, Clarke, and Büttcher, 2009. Each document’s fused score is a simple sum across branches:

rrfScore(d) = Σ over branches 1 / (k + rank(d, branch))

Covalence uses k = 60, the widely-cited default from the original paper. The same constant ships in Microsoft Azure AI Search and Elastic’s Reciprocal Rank Fusion retriever. It is not our choice — it is the standard.

After RRF, each candidate earns a recency adjustment based on the age of the memory.

The decay function is hyperbolic, not exponential:

decayFactor = 1.0 / (1.0 + ageHours / 8760.0)

8760 is hours per year (24 × 365). A memory exactly one year old gets a decay factor of 0.5 — half weight in the recency component. Two years old: 1/3. A day old: about 0.997. The curve is gentle early, flatter later, and never quite hits zero.

Composition into the final ranking score:

finalScore = 0.9 × rrfScore + 0.1 × decayFactor × 0.033

The × 0.033 scales the decay factor into the same numeric range as a first-rank RRF score (which sits around 0.03 when k = 60). The 0.9 / 0.1 split means recency contributes 10% of the final score.

Recency is 10% so that a very recent memory ranks above an equally-relevant old one, without letting recency override genuine relevance. A note written last week should win the tie against a note from two years ago. It should not beat a note that actually answers the question.

When your AI calls memory_search("project context"), Covalence prepends search_query: , embeds the result, truncates to 256 floats, runs vec0 KNN and FTS5 BM25 in parallel, merges them with RRF at k = 60, and applies the hyperbolic recency boost. The top results come back — usually in well under a second, all on-device, no network round-trips.

The full tool surface lives in MCP Tools. If you want to start from zero, the install path is Getting Started.