SynaFS — Compare

How SynaFS compares to the alternatives

SynaFS is one point in a crowded design space — lexical search, external dense-RAG indexers, code-intelligence formats, and semantic filesystems. This page positions it two ways: a capability matrix against those approaches, and a measured, apples-to-apples retrieval comparison on the same corpus, queries, and scoring. See the Agents page for end-to-end token / tool-call results and Related research for sources.

Capability matrix

Where each approach keeps its index, and what it can answer. The dividing line is freshness: SynaFS is the only one that updates the index inside the write, so it is never stale and needs no separate pipeline.

	Always-fresh (index-at-write)	Zero-integration (read()/FUSE)	Hybrid signals (vector + BM25)	Symbol graph	Version / as-of	Agent API (MCP)
grep / ripgreplexical tool-loop	✓	✓	✗	✗	✗	✗
External dense-RAG indexerCursor / Continue / Cody-style	◑	✗	✓	◑	✗	◑
SCIP / LSIFcode-intelligence format	◑	✗	✗	✓	◑	✗
LSFS (2024)LLM semantic filesystem	◑	◑	◑	✗	✗	◑
Semantic File System (1991)Gifford et al.	✓	✓	✗	✗	✗	✗
SynaFSthis project	✓	✓	✓	✓	✓	✓

✓ = native · ◑ = partial / bolt-on · ✗ = no. External RAG indexers can do hybrid retrieval, but their index lives outside the filesystem and re-syncs after the edit; SCIP is incremental yet still a separate build step; LSFS adds LLM file ops above the FS but re-embeds out of band.

Capability coverage

how many of the six capabilities each approach covers · native vs partial

native ✓partial ◑

Measured

Retrieval quality — same corpus, same queries, only the retriever changes

We pulled the 1,603 indexed chunks straight from SynaFS's index and re-ranked the 23 gold queries with three pure dense retrievers, scoring identically to the benchmark. It isolates the one variable that matters — the embedding model. Two jumps stand out: any dense model leaps over keyword search at rank-1 (0.04 → ~0.39, because grep rarely puts the exact file first on paraphrased queries), and a code-specialised embedder then beats the generic ones on every metric. SynaFS's hybrid fusion adds more still — and reaches the answer reading the least code.

LexicalGeneric denseCode-specialisedSynaFS hybrid

Rank-1 accuracy (recall@1)

share of 23 queries with the gold file ranked first · higher is better

Top-5 accuracy (recall@5)

gold file within the top 5 results · higher is better

Mean reciprocal rank (MRR)

1/rank of the gold file, averaged · higher is better

Context read to reach the answer

median KB of whole files ingested before the gold file · lower is better

Files opened to reach the answer

median distinct files before the gold file · lower is better

Rank-1 by query difficulty

recall@1 split by easy / medium / hard (paraphrased) queries · grep vs code-dense vs SynaFS

Retriever	recall@1	recall@5	MRR	files→gold
greplexical tool-loop	0.04	0.52	0.28	5
syna-lexBM25 / hash index	0.04	0.17	0.08	5
Dense · MiniLMall-MiniLM-L6-v2 · 384d · generic	0.39	0.61	0.48	3
Dense · BGE-basebge-base-en-v1.5 · 768d · generic	0.39	0.57	0.49	2
Dense · CodeRankEmbed768d · code-specialised	0.43	0.70	0.56	1.5
SynaFS-semCodeRankEmbed + BM25 + trigram · RRF	0.57	0.83	0.67	1

File-level relevance, 23 NL→code gold queries, 259-file corpus. Dense rows are pure cosine over identical chunks (MiniLM 384d, BGE-base 768d, CodeRankEmbed 768d). grep / syna-lex / SynaFS-sem are from the Performance benchmark; SynaFS-sem = CodeRankEmbed + BM25 + trigram fused with RRF. This is an embedder-isolation study, not a reproduction of any product's full pipeline.

There are two jumps, not one. Lexical → dense buys rank-1 accuracy (0.04 → ~0.39); generic → code-specialised buys the rest — CodeRankEmbed tops every generic embedder (R@5 0.70, MRR 0.56). SynaFS then fuses BM25 + trigram onto the code embedder with RRF for R@1 0.57 / R@5 0.83, and, because the ranking is tighter, reaches the gold file after ~3× less code than pure dense (28 vs ~90 KB). The retrieval value is the code-specialised model; SynaFS's job is to fuse it and keep it always-fresh at the write boundary.

How we ran it — and how to reproduce

The measured table is an embedder-isolation study: every retriever sees the exact same chunks, queries, and scoring, so the only thing that varies is the embedding model. Here is the procedure end to end, and the commands to run it yourself.

Same units. The 1,603 chunks are pulled straight from SynaFS's CodeRankEmbed index (.syna/index.coderank.json) — each carries its source text and file path — so no retriever gets a different chunking.

Embed. Every chunk and all 23 natural-language queries are encoded with each model and L2-normalised. Generic models get their recommended query instruction (e.g. BGE's “Represent this sentence…”), CodeRankEmbed its code-search prompt; sequence length is capped at 512 tokens.

Rank. Files are ordered by cosine similarity (query · chunk); chunks are deduplicated into a distinct-file list, capped at 25 candidates.

Score. Identical to the main retrieval benchmark — recall@1/5/10, MRR, files-to-gold, bytes-to-gold — against the same gold set, with the same allowance for a legitimately ambiguous second implementation.

Anchors. The grep, syna-lex and SynaFS-sem rows are read from the Exp 1 benchmark, so the dense baselines sit in the same frame.

Run it yourself

# 1 · isolated Python env — no system pip/venv required
curl -LsSf https://astral.sh/uv/install.sh | sh
cd experiments
uv venv .venv
uv pip install --python .venv torch --index-url https://download.pytorch.org/whl/cpu
uv pip install --python .venv numpy sentence-transformers einops

# 2 · embedder-isolation benchmark — same chunks, queries, scoring
.venv/bin/python harness/exp7_compare.py

# 3 · results: dense baselines vs grep / syna-lex / SynaFS-sem
cat results/exp7_compare.json

Pure-CPU run; CodeRankEmbed embeds ~1,600 chunks in roughly ten minutes (no GPU). The harness reads the corpus path from a constant at the top of exp7_compare.py — point it at any repo you've indexed with SynaFS. Full code: experiments/harness/exp7_compare.py.

Honesty notes

The capability matrix is a good-faith summary of each approach's typical shape, not a feature audit of any specific product; configurations vary.

The measured table isolates the embedder (pure cosine on identical chunks). It is not a reproduction of Cursor / Cody / SCIP pipelines, which add their own chunking, reranking, and caching.

Single corpus, 23 queries, file-level single-answer gold. Directional and fully reproducible from experiments/ (harness/exp7_compare.py).

Docs

Project