Home/Compare

Landscape

How SynaFS compares to the alternatives

SynaFS is one point in a crowded design space — lexical search, external dense-RAG indexers, code-intelligence formats, and semantic filesystems. This page positions it two ways: a capability matrix against those approaches, and a measured, apples-to-apples retrieval comparison on the same corpus, queries, and scoring. See the Agents page for end-to-end token / tool-call results and Related research for sources.

Positioning

Capability matrix

Where each approach keeps its index, and what it can answer. The dividing line is freshness: SynaFS is the only one that updates the index inside the write, so it is never stale and needs no separate pipeline.

Always-fresh (index-at-write)Zero-integration (read()/FUSE)Hybrid signals (vector + BM25)Symbol graphVersion / as-ofAgent API (MCP)
grep / ripgreplexical tool-loop
External dense-RAG indexerCursor / Continue / Cody-style
SCIP / LSIFcode-intelligence format
LSFS (2024)LLM semantic filesystem
Semantic File System (1991)Gifford et al.
SynaFSthis project

✓ = native · ◑ = partial / bolt-on · ✗ = no. External RAG indexers can do hybrid retrieval, but their index lives outside the filesystem and re-syncs after the edit; SCIP is incremental yet still a separate build step; LSFS adds LLM file ops above the FS but re-embeds out of band.

Capability coverage

how many of the six capabilities each approach covers · native vs partial

nativepartial
grep2Ext. RAG1+3SCIP/LSIF1+2LSFS0+4SFS ’912SynaFS6
Measured

Retrieval quality — same corpus, same queries, only the retriever changes

We pulled the 1,603 indexed chunks straight from SynaFS's index and re-ranked the 23 gold queries with three pure dense retrievers, scoring identically to the benchmark. It isolates the one variable that matters — the embedding model. Two jumps stand out: any dense model leaps over keyword search at rank-1 (0.04 → ~0.39, because grep rarely puts the exact file first on paraphrased queries), and a code-specialised embedder then beats the generic ones on every metric. SynaFS's hybrid fusion adds more still — and reaches the answer reading the least code.

LexicalGeneric denseCode-specialisedSynaFS hybrid

Rank-1 accuracy (recall@1)

share of 23 queries with the gold file ranked first · higher is better

0.000.200.400.600.04grep0.04syna-lex0.39MiniLM0.39BGE0.43CodeRank0.57SynaFS

Top-5 accuracy (recall@5)

gold file within the top 5 results · higher is better

0.000.300.600.900.52grep0.17syna-lex0.61MiniLM0.57BGE0.70CodeRank0.83SynaFS

Mean reciprocal rank (MRR)

1/rank of the gold file, averaged · higher is better

0.000.250.500.750.28grep0.08syna-lex0.48MiniLM0.49BGE0.56CodeRank0.67SynaFS

Context read to reach the answer

median KB of whole files ingested before the gold file · lower is better

grep187 KBsyna-lex82 KBMiniLM93 KBBGE82 KBCodeRank91 KBSynaFS28 KB

Files opened to reach the answer

median distinct files before the gold file · lower is better

grep5syna-lex5MiniLM3BGE2CodeRank1.5SynaFS1

Rank-1 by query difficulty

recall@1 split by easy / medium / hard (paraphrased) queries · grep vs code-dense vs SynaFS

grepCodeRankSynaFS0.000.230.470.700.000.600.60easy0.000.300.50medium0.120.500.62hard
Retrieverrecall@1recall@5MRRfiles→gold
greplexical tool-loop0.040.520.285
syna-lexBM25 / hash index0.040.170.085
Dense · MiniLMall-MiniLM-L6-v2 · 384d · generic0.390.610.483
Dense · BGE-basebge-base-en-v1.5 · 768d · generic0.390.570.492
Dense · CodeRankEmbed768d · code-specialised0.430.700.561.5
SynaFS-semCodeRankEmbed + BM25 + trigram · RRF0.570.830.671

File-level relevance, 23 NL→code gold queries, 259-file corpus. Dense rows are pure cosine over identical chunks (MiniLM 384d, BGE-base 768d, CodeRankEmbed 768d). grep / syna-lex / SynaFS-sem are from the Performance benchmark; SynaFS-sem = CodeRankEmbed + BM25 + trigram fused with RRF. This is an embedder-isolation study, not a reproduction of any product's full pipeline.

There are two jumps, not one. Lexical → dense buys rank-1 accuracy (0.04 → ~0.39); generic → code-specialised buys the rest — CodeRankEmbed tops every generic embedder (R@5 0.70, MRR 0.56). SynaFS then fuses BM25 + trigram onto the code embedder with RRF for R@1 0.57 / R@5 0.83, and, because the ranking is tighter, reaches the gold file after ~3× less code than pure dense (28 vs ~90 KB). The retrieval value is the code-specialised model; SynaFS's job is to fuse it and keep it always-fresh at the write boundary.
Reproduce

How we ran it — and how to reproduce

The measured table is an embedder-isolation study: every retriever sees the exact same chunks, queries, and scoring, so the only thing that varies is the embedding model. Here is the procedure end to end, and the commands to run it yourself.

  1. Same units. The 1,603 chunks are pulled straight from SynaFS's CodeRankEmbed index (.syna/index.coderank.json) — each carries its source text and file path — so no retriever gets a different chunking.
  2. Embed. Every chunk and all 23 natural-language queries are encoded with each model and L2-normalised. Generic models get their recommended query instruction (e.g. BGE's “Represent this sentence…”), CodeRankEmbed its code-search prompt; sequence length is capped at 512 tokens.
  3. Rank. Files are ordered by cosine similarity (query · chunk); chunks are deduplicated into a distinct-file list, capped at 25 candidates.
  4. Score. Identical to the main retrieval benchmark — recall@1/5/10, MRR, files-to-gold, bytes-to-gold — against the same gold set, with the same allowance for a legitimately ambiguous second implementation.
  5. Anchors. The grep, syna-lex and SynaFS-sem rows are read from the Exp 1 benchmark, so the dense baselines sit in the same frame.

Run it yourself

# 1 · isolated Python env — no system pip/venv required
curl -LsSf https://astral.sh/uv/install.sh | sh
cd experiments
uv venv .venv
uv pip install --python .venv torch --index-url https://download.pytorch.org/whl/cpu
uv pip install --python .venv numpy sentence-transformers einops

# 2 · embedder-isolation benchmark — same chunks, queries, scoring
.venv/bin/python harness/exp7_compare.py

# 3 · results: dense baselines vs grep / syna-lex / SynaFS-sem
cat results/exp7_compare.json

Pure-CPU run; CodeRankEmbed embeds ~1,600 chunks in roughly ten minutes (no GPU). The harness reads the corpus path from a constant at the top of exp7_compare.py — point it at any repo you've indexed with SynaFS. Full code: experiments/harness/exp7_compare.py.

Does it change agent behaviour?

Better retrieval only matters if an agent converts it into fewer tokens and tool-calls. It does — for the right model and task shape (and it can backfire for others). The full live A/B across Haiku, Sonnet, Opus 4.8, and Codex is on the Agents page.

Honesty notes