SynaFS — Why a filesystem

The stale-index treadmill

Today a coding agent re-runs the same loop on every turn: ls → grep → read → embed → rerank. The expensive half of that — embedding and reranking — produces an index that lives outside the filesystem, in a separate vector store the agent maintains itself.

Because that index is external, it is always one edit behind. The instant a file is written, every chunk, embedding, and symbol derived from it is stale, and re-syncing is forever bolt-on: a watcher, a cron, a "reindex" button. The agent either pays to re-embed on every query or quietly reasons over yesterday's code.

External indexers — Cursor, Continue, Cody, Aider — all share this shape. They keep a separate copy of the repo's meaning that lags the working tree, and they each ship their own bespoke sync. The freshness problem isn't a bug in any one of them; it's structural, and it follows from putting the index beside the files instead of under the writes.

The thesis: push search down to the write boundary

SynaFS moves search and understanding down to the filesystem's write boundary. Writing a file is the transaction that updates its semantic chunks, embeddings, AST symbols, and lexical index — atomically, in the same step that persists the bytes. There is no separate "indexing" phase to fall behind.

The results are then exposed two ways at once: through the ordinary read() path (magic directories, symlinks, xattrs that any Unix tool can touch) and through a first-class agent API (MCP). Three consequences follow directly from putting the index under the writes:

Zero-integration

Anything that can read a file gains semantic search for free — ripgrep, vim, git, gcc, and every LLM agent alike. There is no SDK to adopt and no embedding pipeline to stand up; the capability arrives through the file API the tool already speaks.

Always-fresh

Because the index is the write path, there is no external pipeline that can lag. A query reflects the bytes you just wrote, with read-your-writes consistency via a per-write token — never a snapshot of the repo as it looked before your last edit.

Unix-composable

Meaning is surfaced as paths, symlinks, xattrs, and a device event stream, so it pipes, greps, and scripts like any other file. SynaFS joins the existing ecosystem instead of asking you to replace it with a proprietary store.

Design principles

Six commitments shape every decision in SynaFS. They are deliberately conservative about POSIX and aggressive about what gets layered on top of it.

POSIX is the floor

Existing tools must keep working unchanged. Every semantic feature is strictly additive — a magic path or xattr you can ignore — never a change to read/write semantics.

Index-on-write & incremental

A write re-embeds only the chunks that actually changed, keyed by a content-addressed chunk hash. Editing one function out of five touches one function's vectors, not the whole file.

Multimodal index

Four signals over one shared chunk ID: vector (semantic), symbol graph (AST defs/refs/callers), lexical (BM25 + trigram), and temporal (version DAG). A query is a hybrid join across all four.

Symbol-level addressing

Results point at functions, classes, and spans — not whole files. The unit of retrieval matches the unit an agent actually edits and reasons about.

Agent-native API (MCP)

The MCP server is a first-class peer to POSIX, not an afterthought. Agents connect directly — search, read_span, symbol_lookup, apply_edit, subscribe — with no external embedding pipeline in between.

Provenance & determinism

Every result carries its why — score, source span, and version — so agents can cache and reason over retrieval, and the same query over the same tree returns the same answer.

Lineage

None of this is new in spirit. The direct ancestor is Gifford et al.'s 1991 Semantic File System, which built the index at write time and exposed it through ordinary read() on virtual directories. SynaFS carries that idea into the LLM era: the write-time index is now embeddings, a symbol graph, and a version DAG instead of attribute transducers — but the conviction that meaning belongs in the filesystem, reachable through the read path, is theirs. We are standing on a thirty-year-old shoulder, not inventing a new one.

Docs

Project

Why a filesystem, not another tool.