Technical Specification
Technical specification — surfaces, data model, vault layout, indexes, security, sizing.
Status: Draft v0.1 · 2026-05-03
Companion docs: PRD.md, CLI.md, API.md, MCP.md, AI.md
1. System overview
Section titled “1. System overview”indx is one process running inside one Docker container. The process is a Next.js 16 (App Router) server that serves four things off the same port:
| Path | Surface |
|---|---|
/ | Web UI (React Server Components) |
/v1/* | HTTP API (REST + SSE) |
/mcp | MCP HTTP/SSE transport endpoint |
/openapi.json | OpenAPI 3.1 spec |
A separate Node binary, indx, is shipped in the image and on npm. It is the CLI; it can either talk to a running server (--remote, default if INDX_URL is set) or operate directly against a vault directory (--local). It also embeds an stdio MCP server (indx mcp serve --stdio) for use with Claude Code, Cursor, etc.
┌────────────────────────────────────────────────────────────┐│ Container ││ ││ apps/web (Next.js 16) ││ ├─ Web UI (React Server Components) ││ ├─ /v1/* REST API (route handlers) ││ ├─ /events SSE stream (route handler, dynamic) ││ └─ /mcp HTTP transport (route handler) ││ ││ ↑ all routes go through @indx/core ││ ││ @indx/core ││ ├─ vault/ FS abstraction (chokidar watcher) ││ ├─ md/ remark/rehype pipeline + AST editor ││ ├─ canvas/ JSON Canvas 1.0 reader/writer ││ ├─ base/ .base YAML reader + query engine ││ ├─ index/ SQLite FTS5 + graph + (optional) vss ││ └─ events/ in-process pub/sub ││ ││ SQLite db at /vault/.indx/index.db ││ Vault at /vault (mounted volume, user-supplied) │└────────────────────────────────────────────────────────────┘2. Tech stack
Section titled “2. Tech stack”| Layer | Choice | Why |
|---|---|---|
| Runtime | Node.js 24 LTS | Default Vercel runtime; widest ecosystem; node:fs covers vault ops |
| Framework | Next.js 16 (App Router) | One server for UI + API + MCP; first-class SSE; mature |
| Language | TypeScript (strict) | Shared types end-to-end |
| Markdown | unified / remark / rehype + custom plugins | AST-preserving edits |
| Storage (index) | better-sqlite3 + FTS5 | Sync, embedded, ~1ms ops, no daemon |
| Storage (vector, optional) | sqlite-vss extension OR remote provider | Single-container constraint; remote optional |
| Embeddings (optional) | AI SDK v6 via Vercel AI Gateway, or OpenAI-compatible | Pluggable; off by default |
| File watcher | chokidar | Reliable cross-platform |
| Validation | Zod | Single source of truth → TS types + JSON Schema → OpenAPI |
| MCP | @modelcontextprotocol/sdk | Reference implementation |
| UI | React 19 + Tailwind + shadcn/ui | Standard Vercel-flavored stack; fast |
| Editor | CodeMirror 6 | Lightweight markdown editor with plugin hooks |
| Build | pnpm workspaces + tsup (libs) + Next build (app) | No Turborepo dep; small footprint |
| Container | Multi-stage Dockerfile, distroless final | Image <100 MB |
3. On-disk layout
Section titled “3. On-disk layout”/vault/ # mounted volume (user-owned)├── notes/ # arbitrary user folders│ ├── Home.md│ └── Projects/...├── .obsidian/ # preserved, read-only by indx│ ├── app.json│ ├── workspace.json│ ├── community-plugins.json│ └── ...└── .indx/ # owned by indx, recreatable ├── index.db # SQLite (FTS5 + graph + meta) ├── vss.db # optional vector store ├── config.json # token list, scopes, settings └── events.log # rolling NDJSON event logInvariant: deleting .indx/ and restarting the container must produce a fully functional system after a single reindex pass. The vault directory is the source of truth.
.obsidian/ is read by indx (to honor user preferences like attachment folder, tag delimiters) but never written. This keeps Obsidian and indx coexistence safe.
4. Vault format
Section titled “4. Vault format”indx implements the Obsidian-flavored markdown specification. Concretely:
4.1 Frontmatter
Section titled “4.1 Frontmatter”- YAML between
---fences at the very start of a file. - Properties become structured fields; type is inferred (string, number, bool, list, date, datetime).
tags:,aliases:,cssclasses:recognized as Obsidian conventions.- Round-trip preservation: unknown keys are kept verbatim.
4.2 Wikilinks
Section titled “4.2 Wikilinks”[[Note]][[Note|Display]][[Note#Heading]][[Note#^block-id]][[#Heading in same note]]- Resolution: filename match (case-sensitive on Linux container, but indx normalizes; see §6.4), no extension required, falling back to alias if defined in target frontmatter.
4.3 Embeds
Section titled “4.3 Embeds”![[Note]]for note embed![[Note#Heading]]/![[Note#^id]]for partial embed![[image.png]]/![[file.pdf]]for attachments
4.4 Callouts
Section titled “4.4 Callouts”- Blockquote with
> [!type]first line. - Recognized types:
note,tip,warning,info,example,quote,bug,danger,success,failure,question,abstract,todo. Unknown types are passed through.
4.5 Tags
Section titled “4.5 Tags”#tag,#nested/tag. Allowed chars: letters, digits (not first),_,-,/.- Frontmatter
tags:list also indexed as tags (without#).
4.6 Block IDs
Section titled “4.6 Block IDs”^block-idat end of a block. Linkable as[[Note#^block-id]].
4.7 Math & diagrams
Section titled “4.7 Math & diagrams”- LaTeX:
$inline$and$$display$$. - Mermaid:
```mermaidfenced blocks. - Both pass through unmodified — rendering is the UI’s job.
4.8 Canvas
Section titled “4.8 Canvas”.canvasfiles conform to JSON Canvas 1.0 spec.- Nodes:
text,file,link,group. Edges connectfromNode→toNode. - Unknown fields are preserved on round-trip (the spec is intentionally extensible).
4.9 Bases
Section titled “4.9 Bases”.basefiles are YAML with the documented Bases syntax (filters,formulas,properties,views).- indx parses the YAML, builds a query plan against the vault’s frontmatter index, and returns rows.
- Write support in v1 is limited to creating/updating the YAML file; row data lives in the linked notes’ frontmatter.
5. Indexing & search
Section titled “5. Indexing & search”5.1 Schema (SQLite)
Section titled “5.1 Schema (SQLite)”-- One row per file in the vaultCREATE TABLE notes ( path TEXT PRIMARY KEY, -- vault-relative, posix-style kind TEXT NOT NULL, -- 'md' | 'canvas' | 'base' | 'attachment' size INTEGER NOT NULL, mtime_ms INTEGER NOT NULL, hash TEXT NOT NULL, -- xxhash64 of bytes frontmatter JSON, -- parsed YAML or null outline JSON, -- [{level, text, line}, ...] body_text TEXT -- plain-text projection for FTS);
CREATE VIRTUAL TABLE notes_fts USING fts5( path UNINDEXED, title, body, tags, content='notes_view', tokenize='unicode61');
CREATE TABLE links ( src_path TEXT NOT NULL, dst_path TEXT, -- null if unresolved dst_raw TEXT NOT NULL, -- original [[target]] string anchor TEXT, -- heading or block id kind TEXT NOT NULL, -- 'wiki' | 'embed' | 'md' line INTEGER NOT NULL, PRIMARY KEY (src_path, line, dst_raw));CREATE INDEX links_dst ON links(dst_path);
CREATE TABLE tags ( path TEXT NOT NULL, tag TEXT NOT NULL, PRIMARY KEY (path, tag));CREATE INDEX tags_tag ON tags(tag);
CREATE TABLE blocks ( path TEXT NOT NULL, block_id TEXT NOT NULL, line INTEGER NOT NULL, PRIMARY KEY (path, block_id));
-- Optional, only if vss enabledCREATE VIRTUAL TABLE notes_vss USING vss0( embedding(1536));5.2 Search modes
Section titled “5.2 Search modes”- lexical — FTS5 BM25 over title/body/tags (default).
- semantic — KNN over
notes_vss(only if embeddings configured). - hybrid — reciprocal rank fusion over lexical + semantic.
- structural filters — frontmatter equality/range/contains,
tag:foo,path:Projects/**,linksTo:[[X]],linkedFrom:[[Y]]. Composable with all modes.
5.3 Indexing pipeline
Section titled “5.3 Indexing pipeline”- Walk vault on startup; ignore
.git,.indx,.obsidian(the latter is read separately for config only). - Parse each file into AST; extract frontmatter, outline, links, tags, blocks, body text projection.
- Diff against
notestable by hash; upsert changes only. - Watch with chokidar; debounce 200 ms; same diff/upsert pipeline per change.
- Embed (optional, async) any changed body texts; write to
notes_vss.
Reindex of a 10 k-note vault should complete in <30 s on a laptop, <5 s incremental on common edit patterns.
6. Editing model
Section titled “6. Editing model”6.1 Atomic write
Section titled “6.1 Atomic write”All writes go through core.vault.writeNote(path, content, { ifMatch?: etag }). The function:
- Validates the new content (parse must succeed; warnings allowed, errors rejected).
- If
ifMatchis provided, compares against the current hash and rejects with409 Conflicton mismatch. - Writes to a temp file and atomically
rename()s into place. - Updates the index synchronously before returning.
- Emits a
note.updatedevent.
6.2 Patch operations
Section titled “6.2 Patch operations”The editor exposes a small, AST-aware patch language so agents don’t ship whole files for tiny edits:
type Patch = | { op: "set_frontmatter"; key: string; value: unknown } | { op: "delete_frontmatter"; key: string } | { op: "append_body"; markdown: string } | { op: "prepend_body"; markdown: string } | { op: "insert_before_heading"; heading: string; markdown: string } | { op: "insert_after_heading"; heading: string; markdown: string } | { op: "replace_section"; heading: string; markdown: string } | { op: "replace_block"; block_id: string; markdown: string } | { op: "rename_heading"; from: string; to: string };Patches are applied against the AST, not against text positions, so they’re robust to formatting changes. The result is re-serialized through the same remark printer that produced the file (configurable via .indx/config.json to honor a vault’s quote/list-marker conventions).
6.3 Move / rename
Section titled “6.3 Move / rename”note.move(from, to) renames the file and rewrites incoming wikilinks to keep the graph intact. The rewrite is opt-in via update_links: true (default true); an agent can disable it for advanced workflows.
6.4 Path normalization
Section titled “6.4 Path normalization”- All API/CLI/MCP paths are vault-relative, posix-style (
Projects/Foo.md). - Indx normalizes user-supplied paths: strip leading
/, collapse.., reject paths escaping the vault root. - Matching is case-sensitive on Linux. The optional
case_insensitive_matchsetting will fall back to a case-folded lookup when an exact match fails.
7. Events
Section titled “7. Events”A single in-process pub/sub emits VaultEvent objects:
type VaultEvent = | { type: "note.created"; path: string; etag: string; at: string; actor: Actor } | { type: "note.updated"; path: string; etag: string; at: string; actor: Actor } | { type: "note.deleted"; path: string; at: string; actor: Actor } | { type: "note.moved"; from: string; to: string; at: string; actor: Actor } | { type: "index.reindexed"; reason: string; at: string };
type Actor = { kind: "ui" | "api" | "cli" | "mcp" | "fs"; id?: string };The same stream feeds:
GET /v1/events(SSE)- The web UI activity panel
- The rolling
.indx/events.log(for audit; rotated at 10 MB)
8. Auth
Section titled “8. Auth”8.1 Tokens
Section titled “8.1 Tokens”- One or more bearer tokens are configured via
INDX_TOKEN(single) orINDX_TOKENS_FILE(JSON array with optional scopes). - Format:
indx_<32 hex chars>. - Scopes (optional):
vault:read,vault:write,vault:admin,path:<glob>. - A request without a valid token gets 401; insufficient scope gets 403.
8.2 CSRF / origin
Section titled “8.2 CSRF / origin”- Web UI uses session cookies tied to the bearer token;
SameSite=Strict. - Cross-origin API requests require
Authorization: Bearer ….
8.3 Future
Section titled “8.3 Future”- v1.1+: OIDC adapter with per-user vaults; not in v1 scope.
9. Configuration
Section titled “9. Configuration”9.1 Environment variables
Section titled “9.1 Environment variables”| Var | Default | Notes |
|---|---|---|
INDX_VAULT | /vault | Path to vault directory |
INDX_PORT | 3000 | HTTP port |
INDX_TOKEN | (required) | Single static bearer token |
INDX_TOKENS_FILE | unset | Path to JSON [{token, scopes}] |
INDX_LOG_LEVEL | info | error | warn | info | debug |
INDX_EMBEDDINGS_PROVIDER | unset | vercel-ai-gateway | openai | ollama |
INDX_EMBEDDINGS_MODEL | unset | Provider-specific model id |
INDX_AI_GATEWAY_KEY | unset | Required if provider = vercel-ai-gateway |
INDX_OPENAI_BASE_URL | unset | For OpenAI-compatible endpoints |
INDX_OPENAI_API_KEY | unset | |
INDX_VECTOR_STORE | none | none | sqlite-vss |
INDX_AI_PROVIDER | inherits INDX_EMBEDDINGS_PROVIDER | Chat-model provider for the AI runtime |
INDX_AI_MODEL | unset | Provider-specific chat model id |
INDX_AI_MAX_INPUT_TOKENS | 64000 | Hard cap on prompt context |
INDX_AI_MAX_OUTPUT_TOKENS | 2048 | Hard cap on generated output |
INDX_AI_TEMPERATURE | 0 | Deterministic by default |
INDX_AI_DAILY_COST_USD | unset | Optional per-day spend ceiling |
INDX_AI_ALLOW_GLOBS | unset | Comma-list of globs eligible for AI ops |
INDX_AI_DENY_GLOBS | unset | Comma-list of globs excluded from AI ops |
INDX_AI_CACHE | on | on | off |
INDX_AI_TIMEOUT_MS | 30000 | Per-call upstream timeout |
INDX_READONLY | false | Hard-block writes |
INDX_TRUST_PROXY | false | Honor X-Forwarded-* |
9.2 Vault config (.indx/config.json)
Section titled “9.2 Vault config (.indx/config.json)”{ "version": 1, "tokens": [{ "id": "default", "scopes": ["vault:read", "vault:write"] }], "markdown": { "list_marker": "-", "thematic_break": "---" }, "embeddings": { "enabled": false }, "ai": { "enabled": true, "provider": null, "model": null, "temperature": 0, "max_input_tokens": 64000, "max_output_tokens": 2048, "allow_globs": ["**/*.md"], "deny_globs": ["Private/**"], "cache": { "enabled": true, "ttl_hours": 24 }, "daily_cost_usd": null }, "ignore": [".git/**", ".obsidian/plugins/**"]}ai.* fields are described in AI.md §3.3. Vault config wins for tuning knobs; env wins for safety knobs (provider credentials, enabled).
10. Deployment
Section titled “10. Deployment”10.1 Docker image
Section titled “10.1 Docker image”Multi-stage build:
node:24-alpinebuilder: installs deps, builds Next.js standalone, builds CLI.gcr.io/distroless/nodejs24-debian12runtime: copiesapps/web/.next/standalone,node_modules, theindxCLI, and a tiny init script.
Final image target: <100 MB compressed. Healthcheck: GET /v1/health.
10.2 Volumes
Section titled “10.2 Volumes”/vault— required, the user’s vault./data— optional, only if user wants.indx/outside the vault (INDX_INDEX_DIR=/data).
10.3 Compose
Section titled “10.3 Compose”services: indx: image: ghcr.io/indx/indx:latest ports: ["3000:3000"] volumes: ["./vault:/vault"] environment: - INDX_TOKEN=${INDX_TOKEN} restart: unless-stopped10.4 Sizing
Section titled “10.4 Sizing”| Vault size | Index size | RAM idle | RAM peak (reindex) |
|---|---|---|---|
| 1 k notes | ~5 MB | 200 MB | 350 MB |
| 10 k notes | ~50 MB | 240 MB | 500 MB |
| 100 k notes | ~500 MB | 400 MB | 1.5 GB |
11. Observability
Section titled “11. Observability”- Structured JSON logs on stdout (
pino). /v1/healthreturns{ status, vault, indexed_notes, last_reindex_at, uptime_s }.- Optional OpenTelemetry exporter via
OTEL_EXPORTER_OTLP_ENDPOINTenv (off by default).
12. Security model
Section titled “12. Security model”- Trust boundary: the host filesystem. Anyone with disk access trumps indx auth.
- What indx defends against: unauthenticated network callers, scope-escalation by tokens, path-traversal via API/CLI inputs.
- What indx does not defend against: a malicious agent with a valid
vault:writetoken (use scoped tokens), the host root account, or attacks on the embeddings provider. - Path safety: all user-supplied paths are normalized and resolved under
INDX_VAULT;..and absolute paths are rejected. - Rate limiting: v1 ships a per-token sliding-window limiter (default 100 rps, 1000 burst). Configurable.
- Bot defense: if exposed publicly, recommend Vercel BotID / a reverse proxy with WAF — out of scope for the container.
13. Testing strategy
Section titled “13. Testing strategy”- Unit: AST patch ops, link resolution, frontmatter round-trip.
- Property: “open → write-back unchanged → byte-identical” against a corpus of 1 k+ public Obsidian sample vaults.
- Integration: Docker image boot +
/v1/health+ sample CRUD via API/CLI/MCP. - Agent eval: 50-task benchmark (CRUD, search, link-graph, canvas) run via Claude/GPT/local model; pass rate gates release.
14. Schema-first contract
Section titled “14. Schema-first contract”All public surfaces are generated from a single Zod tree in packages/shared:
Zod schemas ├─→ TypeScript types (used by core, web, cli) ├─→ JSON Schema (used by MCP tool definitions) └─→ OpenAPI 3.1 (served at /openapi.json, used by CLI for codegen)Adding a new capability is one PR that:
- Adds the Zod schema in
@indx/shared. - Implements the operation in
@indx/core. - Wires it into all three surfaces (API route, CLI verb, MCP tool) via thin adapters.
Backwards-compatible additions never bump /v1. Breaking changes wait for /v2.
15. AI runtime
Section titled “15. AI runtime”The AI runtime adds four built-in operations on top of the existing surfaces
— summarize, ask, toc, relate — accessible identically from the
HTTP API (/v1/ai/*), the CLI (indx ai *), and MCP (ai_* tools). The
full specification is in AI.md. Key invariants relevant to
this document:
- Provider plumbing reuses §9.1. A new
INDX_AI_PROVIDER/INDX_AI_MODELpair carries the chat model role; embeddings env vars are unchanged. IfINDX_AI_PROVIDERis unset it falls back toINDX_EMBEDDINGS_PROVIDER. With neither configured the AI runtime is off, AI tools are not advertised, and/v1/ai/*returns503 ai_unavailable. - No new outbound surface. Per
SPEC §12, the only outbound calls indx ever makes are to a configured AI provider. AI features tighten this stance: no provider configured ⇒ no outbound call possible. - Same write pipeline. AI ops that materialize content
(
ai_toc --moc --write …) go throughcore.vault.writeNote(§6.1) — atomic, ETagged, idempotent, audited. There is no “AI-only” write path. - Schema-first. AI inputs and outputs are Zod schemas in
@indx/sharedexposed via/openapi.jsonand the MCP tool catalog exactly like every other op (§14). - Audit. Every AI invocation emits an
ai.invocationevent into the pub/sub bus (§7) and the rolling event log; payload is metadata only (paths, counts, costs, durations) — no prompts or outputs are persisted.