Skip to content

Technical Specification

Technical specification — surfaces, data model, vault layout, indexes, security, sizing.

Status: Draft v0.1 · 2026-05-03 Companion docs: PRD.md, CLI.md, API.md, MCP.md, AI.md


indx is one process running inside one Docker container. The process is a Next.js 16 (App Router) server that serves four things off the same port:

PathSurface
/Web UI (React Server Components)
/v1/*HTTP API (REST + SSE)
/mcpMCP HTTP/SSE transport endpoint
/openapi.jsonOpenAPI 3.1 spec

A separate Node binary, indx, is shipped in the image and on npm. It is the CLI; it can either talk to a running server (--remote, default if INDX_URL is set) or operate directly against a vault directory (--local). It also embeds an stdio MCP server (indx mcp serve --stdio) for use with Claude Code, Cursor, etc.

┌────────────────────────────────────────────────────────────┐
│ Container │
│ │
│ apps/web (Next.js 16) │
│ ├─ Web UI (React Server Components) │
│ ├─ /v1/* REST API (route handlers) │
│ ├─ /events SSE stream (route handler, dynamic) │
│ └─ /mcp HTTP transport (route handler) │
│ │
│ ↑ all routes go through @indx/core │
│ │
│ @indx/core │
│ ├─ vault/ FS abstraction (chokidar watcher) │
│ ├─ md/ remark/rehype pipeline + AST editor │
│ ├─ canvas/ JSON Canvas 1.0 reader/writer │
│ ├─ base/ .base YAML reader + query engine │
│ ├─ index/ SQLite FTS5 + graph + (optional) vss │
│ └─ events/ in-process pub/sub │
│ │
│ SQLite db at /vault/.indx/index.db │
│ Vault at /vault (mounted volume, user-supplied) │
└────────────────────────────────────────────────────────────┘
LayerChoiceWhy
RuntimeNode.js 24 LTSDefault Vercel runtime; widest ecosystem; node:fs covers vault ops
FrameworkNext.js 16 (App Router)One server for UI + API + MCP; first-class SSE; mature
LanguageTypeScript (strict)Shared types end-to-end
Markdownunified / remark / rehype + custom pluginsAST-preserving edits
Storage (index)better-sqlite3 + FTS5Sync, embedded, ~1ms ops, no daemon
Storage (vector, optional)sqlite-vss extension OR remote providerSingle-container constraint; remote optional
Embeddings (optional)AI SDK v6 via Vercel AI Gateway, or OpenAI-compatiblePluggable; off by default
File watcherchokidarReliable cross-platform
ValidationZodSingle source of truth → TS types + JSON Schema → OpenAPI
MCP@modelcontextprotocol/sdkReference implementation
UIReact 19 + Tailwind + shadcn/uiStandard Vercel-flavored stack; fast
EditorCodeMirror 6Lightweight markdown editor with plugin hooks
Buildpnpm workspaces + tsup (libs) + Next build (app)No Turborepo dep; small footprint
ContainerMulti-stage Dockerfile, distroless finalImage <100 MB
/vault/ # mounted volume (user-owned)
├── notes/ # arbitrary user folders
│ ├── Home.md
│ └── Projects/...
├── .obsidian/ # preserved, read-only by indx
│ ├── app.json
│ ├── workspace.json
│ ├── community-plugins.json
│ └── ...
└── .indx/ # owned by indx, recreatable
├── index.db # SQLite (FTS5 + graph + meta)
├── vss.db # optional vector store
├── config.json # token list, scopes, settings
└── events.log # rolling NDJSON event log

Invariant: deleting .indx/ and restarting the container must produce a fully functional system after a single reindex pass. The vault directory is the source of truth.

.obsidian/ is read by indx (to honor user preferences like attachment folder, tag delimiters) but never written. This keeps Obsidian and indx coexistence safe.

indx implements the Obsidian-flavored markdown specification. Concretely:

  • YAML between --- fences at the very start of a file.
  • Properties become structured fields; type is inferred (string, number, bool, list, date, datetime).
  • tags:, aliases:, cssclasses: recognized as Obsidian conventions.
  • Round-trip preservation: unknown keys are kept verbatim.
  • [[Note]]
  • [[Note|Display]]
  • [[Note#Heading]]
  • [[Note#^block-id]]
  • [[#Heading in same note]]
  • Resolution: filename match (case-sensitive on Linux container, but indx normalizes; see §6.4), no extension required, falling back to alias if defined in target frontmatter.
  • ![[Note]] for note embed
  • ![[Note#Heading]] / ![[Note#^id]] for partial embed
  • ![[image.png]] / ![[file.pdf]] for attachments
  • Blockquote with > [!type] first line.
  • Recognized types: note, tip, warning, info, example, quote, bug, danger, success, failure, question, abstract, todo. Unknown types are passed through.
  • #tag, #nested/tag. Allowed chars: letters, digits (not first), _, -, /.
  • Frontmatter tags: list also indexed as tags (without #).
  • ^block-id at end of a block. Linkable as [[Note#^block-id]].
  • LaTeX: $inline$ and $$display$$.
  • Mermaid: ```mermaid fenced blocks.
  • Both pass through unmodified — rendering is the UI’s job.
  • .canvas files conform to JSON Canvas 1.0 spec.
  • Nodes: text, file, link, group. Edges connect fromNodetoNode.
  • Unknown fields are preserved on round-trip (the spec is intentionally extensible).
  • .base files are YAML with the documented Bases syntax (filters, formulas, properties, views).
  • indx parses the YAML, builds a query plan against the vault’s frontmatter index, and returns rows.
  • Write support in v1 is limited to creating/updating the YAML file; row data lives in the linked notes’ frontmatter.
-- One row per file in the vault
CREATE TABLE notes (
path TEXT PRIMARY KEY, -- vault-relative, posix-style
kind TEXT NOT NULL, -- 'md' | 'canvas' | 'base' | 'attachment'
size INTEGER NOT NULL,
mtime_ms INTEGER NOT NULL,
hash TEXT NOT NULL, -- xxhash64 of bytes
frontmatter JSON, -- parsed YAML or null
outline JSON, -- [{level, text, line}, ...]
body_text TEXT -- plain-text projection for FTS
);
CREATE VIRTUAL TABLE notes_fts USING fts5(
path UNINDEXED, title, body, tags,
content='notes_view', tokenize='unicode61'
);
CREATE TABLE links (
src_path TEXT NOT NULL,
dst_path TEXT, -- null if unresolved
dst_raw TEXT NOT NULL, -- original [[target]] string
anchor TEXT, -- heading or block id
kind TEXT NOT NULL, -- 'wiki' | 'embed' | 'md'
line INTEGER NOT NULL,
PRIMARY KEY (src_path, line, dst_raw)
);
CREATE INDEX links_dst ON links(dst_path);
CREATE TABLE tags (
path TEXT NOT NULL,
tag TEXT NOT NULL,
PRIMARY KEY (path, tag)
);
CREATE INDEX tags_tag ON tags(tag);
CREATE TABLE blocks (
path TEXT NOT NULL,
block_id TEXT NOT NULL,
line INTEGER NOT NULL,
PRIMARY KEY (path, block_id)
);
-- Optional, only if vss enabled
CREATE VIRTUAL TABLE notes_vss USING vss0(
embedding(1536)
);
  • lexical — FTS5 BM25 over title/body/tags (default).
  • semantic — KNN over notes_vss (only if embeddings configured).
  • hybrid — reciprocal rank fusion over lexical + semantic.
  • structural filters — frontmatter equality/range/contains, tag:foo, path:Projects/**, linksTo:[[X]], linkedFrom:[[Y]]. Composable with all modes.
  1. Walk vault on startup; ignore .git, .indx, .obsidian (the latter is read separately for config only).
  2. Parse each file into AST; extract frontmatter, outline, links, tags, blocks, body text projection.
  3. Diff against notes table by hash; upsert changes only.
  4. Watch with chokidar; debounce 200 ms; same diff/upsert pipeline per change.
  5. Embed (optional, async) any changed body texts; write to notes_vss.

Reindex of a 10 k-note vault should complete in <30 s on a laptop, <5 s incremental on common edit patterns.

All writes go through core.vault.writeNote(path, content, { ifMatch?: etag }). The function:

  1. Validates the new content (parse must succeed; warnings allowed, errors rejected).
  2. If ifMatch is provided, compares against the current hash and rejects with 409 Conflict on mismatch.
  3. Writes to a temp file and atomically rename()s into place.
  4. Updates the index synchronously before returning.
  5. Emits a note.updated event.

The editor exposes a small, AST-aware patch language so agents don’t ship whole files for tiny edits:

type Patch =
| { op: "set_frontmatter"; key: string; value: unknown }
| { op: "delete_frontmatter"; key: string }
| { op: "append_body"; markdown: string }
| { op: "prepend_body"; markdown: string }
| { op: "insert_before_heading"; heading: string; markdown: string }
| { op: "insert_after_heading"; heading: string; markdown: string }
| { op: "replace_section"; heading: string; markdown: string }
| { op: "replace_block"; block_id: string; markdown: string }
| { op: "rename_heading"; from: string; to: string };

Patches are applied against the AST, not against text positions, so they’re robust to formatting changes. The result is re-serialized through the same remark printer that produced the file (configurable via .indx/config.json to honor a vault’s quote/list-marker conventions).

note.move(from, to) renames the file and rewrites incoming wikilinks to keep the graph intact. The rewrite is opt-in via update_links: true (default true); an agent can disable it for advanced workflows.

  • All API/CLI/MCP paths are vault-relative, posix-style (Projects/Foo.md).
  • Indx normalizes user-supplied paths: strip leading /, collapse .., reject paths escaping the vault root.
  • Matching is case-sensitive on Linux. The optional case_insensitive_match setting will fall back to a case-folded lookup when an exact match fails.

A single in-process pub/sub emits VaultEvent objects:

type VaultEvent =
| { type: "note.created"; path: string; etag: string; at: string; actor: Actor }
| { type: "note.updated"; path: string; etag: string; at: string; actor: Actor }
| { type: "note.deleted"; path: string; at: string; actor: Actor }
| { type: "note.moved"; from: string; to: string; at: string; actor: Actor }
| { type: "index.reindexed"; reason: string; at: string };
type Actor = { kind: "ui" | "api" | "cli" | "mcp" | "fs"; id?: string };

The same stream feeds:

  • GET /v1/events (SSE)
  • The web UI activity panel
  • The rolling .indx/events.log (for audit; rotated at 10 MB)
  • One or more bearer tokens are configured via INDX_TOKEN (single) or INDX_TOKENS_FILE (JSON array with optional scopes).
  • Format: indx_<32 hex chars>.
  • Scopes (optional): vault:read, vault:write, vault:admin, path:<glob>.
  • A request without a valid token gets 401; insufficient scope gets 403.
  • Web UI uses session cookies tied to the bearer token; SameSite=Strict.
  • Cross-origin API requests require Authorization: Bearer ….
  • v1.1+: OIDC adapter with per-user vaults; not in v1 scope.
VarDefaultNotes
INDX_VAULT/vaultPath to vault directory
INDX_PORT3000HTTP port
INDX_TOKEN(required)Single static bearer token
INDX_TOKENS_FILEunsetPath to JSON [{token, scopes}]
INDX_LOG_LEVELinfoerror | warn | info | debug
INDX_EMBEDDINGS_PROVIDERunsetvercel-ai-gateway | openai | ollama
INDX_EMBEDDINGS_MODELunsetProvider-specific model id
INDX_AI_GATEWAY_KEYunsetRequired if provider = vercel-ai-gateway
INDX_OPENAI_BASE_URLunsetFor OpenAI-compatible endpoints
INDX_OPENAI_API_KEYunset
INDX_VECTOR_STOREnonenone | sqlite-vss
INDX_AI_PROVIDERinherits INDX_EMBEDDINGS_PROVIDERChat-model provider for the AI runtime
INDX_AI_MODELunsetProvider-specific chat model id
INDX_AI_MAX_INPUT_TOKENS64000Hard cap on prompt context
INDX_AI_MAX_OUTPUT_TOKENS2048Hard cap on generated output
INDX_AI_TEMPERATURE0Deterministic by default
INDX_AI_DAILY_COST_USDunsetOptional per-day spend ceiling
INDX_AI_ALLOW_GLOBSunsetComma-list of globs eligible for AI ops
INDX_AI_DENY_GLOBSunsetComma-list of globs excluded from AI ops
INDX_AI_CACHEonon | off
INDX_AI_TIMEOUT_MS30000Per-call upstream timeout
INDX_READONLYfalseHard-block writes
INDX_TRUST_PROXYfalseHonor X-Forwarded-*
{
"version": 1,
"tokens": [{ "id": "default", "scopes": ["vault:read", "vault:write"] }],
"markdown": { "list_marker": "-", "thematic_break": "---" },
"embeddings": { "enabled": false },
"ai": {
"enabled": true,
"provider": null,
"model": null,
"temperature": 0,
"max_input_tokens": 64000,
"max_output_tokens": 2048,
"allow_globs": ["**/*.md"],
"deny_globs": ["Private/**"],
"cache": { "enabled": true, "ttl_hours": 24 },
"daily_cost_usd": null
},
"ignore": [".git/**", ".obsidian/plugins/**"]
}

ai.* fields are described in AI.md §3.3. Vault config wins for tuning knobs; env wins for safety knobs (provider credentials, enabled).

Multi-stage build:

  1. node:24-alpine builder: installs deps, builds Next.js standalone, builds CLI.
  2. gcr.io/distroless/nodejs24-debian12 runtime: copies apps/web/.next/standalone, node_modules, the indx CLI, and a tiny init script.

Final image target: <100 MB compressed. Healthcheck: GET /v1/health.

  • /vault — required, the user’s vault.
  • /data — optional, only if user wants .indx/ outside the vault (INDX_INDEX_DIR=/data).
services:
indx:
image: ghcr.io/indx/indx:latest
ports: ["3000:3000"]
volumes: ["./vault:/vault"]
environment:
- INDX_TOKEN=${INDX_TOKEN}
restart: unless-stopped
Vault sizeIndex sizeRAM idleRAM peak (reindex)
1 k notes~5 MB200 MB350 MB
10 k notes~50 MB240 MB500 MB
100 k notes~500 MB400 MB1.5 GB
  • Structured JSON logs on stdout (pino).
  • /v1/health returns { status, vault, indexed_notes, last_reindex_at, uptime_s }.
  • Optional OpenTelemetry exporter via OTEL_EXPORTER_OTLP_ENDPOINT env (off by default).
  • Trust boundary: the host filesystem. Anyone with disk access trumps indx auth.
  • What indx defends against: unauthenticated network callers, scope-escalation by tokens, path-traversal via API/CLI inputs.
  • What indx does not defend against: a malicious agent with a valid vault:write token (use scoped tokens), the host root account, or attacks on the embeddings provider.
  • Path safety: all user-supplied paths are normalized and resolved under INDX_VAULT; .. and absolute paths are rejected.
  • Rate limiting: v1 ships a per-token sliding-window limiter (default 100 rps, 1000 burst). Configurable.
  • Bot defense: if exposed publicly, recommend Vercel BotID / a reverse proxy with WAF — out of scope for the container.
  • Unit: AST patch ops, link resolution, frontmatter round-trip.
  • Property: “open → write-back unchanged → byte-identical” against a corpus of 1 k+ public Obsidian sample vaults.
  • Integration: Docker image boot + /v1/health + sample CRUD via API/CLI/MCP.
  • Agent eval: 50-task benchmark (CRUD, search, link-graph, canvas) run via Claude/GPT/local model; pass rate gates release.

All public surfaces are generated from a single Zod tree in packages/shared:

Zod schemas
├─→ TypeScript types (used by core, web, cli)
├─→ JSON Schema (used by MCP tool definitions)
└─→ OpenAPI 3.1 (served at /openapi.json, used by CLI for codegen)

Adding a new capability is one PR that:

  1. Adds the Zod schema in @indx/shared.
  2. Implements the operation in @indx/core.
  3. Wires it into all three surfaces (API route, CLI verb, MCP tool) via thin adapters.

Backwards-compatible additions never bump /v1. Breaking changes wait for /v2.

The AI runtime adds four built-in operations on top of the existing surfaces — summarize, ask, toc, relate — accessible identically from the HTTP API (/v1/ai/*), the CLI (indx ai *), and MCP (ai_* tools). The full specification is in AI.md. Key invariants relevant to this document:

  • Provider plumbing reuses §9.1. A new INDX_AI_PROVIDER / INDX_AI_MODEL pair carries the chat model role; embeddings env vars are unchanged. If INDX_AI_PROVIDER is unset it falls back to INDX_EMBEDDINGS_PROVIDER. With neither configured the AI runtime is off, AI tools are not advertised, and /v1/ai/* returns 503 ai_unavailable.
  • No new outbound surface. Per SPEC §12, the only outbound calls indx ever makes are to a configured AI provider. AI features tighten this stance: no provider configured ⇒ no outbound call possible.
  • Same write pipeline. AI ops that materialize content (ai_toc --moc --write …) go through core.vault.writeNote (§6.1) — atomic, ETagged, idempotent, audited. There is no “AI-only” write path.
  • Schema-first. AI inputs and outputs are Zod schemas in @indx/shared exposed via /openapi.json and the MCP tool catalog exactly like every other op (§14).
  • Audit. Every AI invocation emits an ai.invocation event into the pub/sub bus (§7) and the rolling event log; payload is metadata only (paths, counts, costs, durations) — no prompts or outputs are persisted.