AI Runtime
Built-in AI runtime — summarize, ask, toc, relate, tag, metadata, extract — over CLI, API, MCP.
Status: Draft v0.1 · 2026-05-03
Companion docs: PRD.md, SPEC.md, API.md, CLI.md, MCP.md
Audience: AI agents (primary), self-hosters, library authors.
The vault is the truth. AI is a derived capability layered on top. Indx ships a small, opinionated AI runtime so agents can reach for summary, search-grounded answers, table-of-contents creation, and relationship detection without wiring their own retrieval pipeline. Every operation is reachable identically from the CLI, the HTTP API, and MCP.
1. Why a built-in AI runtime
Section titled “1. Why a built-in AI runtime”Two failure modes shaped this design:
- Roll-your-own RAG against a vault is brittle. Every new agent reinvents
retrieval, snippet extraction, citation formatting, error fallback, and
provider plumbing. The vault is right there; indx already indexes it. The
runtime should expose ready-made AI ops the same way it exposes
note_read— one tool call, one structured result. - AI features that aren’t optional are a tax. A user with an air-gapped self-host or no AI key SHALL still get a fully functional vault. AI ops are opt-in: they are advertised iff a provider is configured, and they degrade explicitly (never silently) when retrieval is the only thing available.
The runtime adds seven operations on top of the existing surfaces:
| Op | What it does | Read-only by default? |
|---|---|---|
ai_summarize | Summarize one note, many notes, or a query/tag scope. | yes |
ai_ask | Natural-language question answered from vault content with citations. | yes |
ai_toc | Build a table of contents (single note) or a Map-of-Content (folder/glob/tag). | yes — write is opt-in |
ai_relate | Detect relationships between notes (related, extends, cites, contradicts, …). | yes — propose_links returns drafts only |
ai_tag | Suggest tags for notes, biased toward the existing vault vocabulary. | yes — apply is opt-in |
ai_metadata | Fill or refine typed frontmatter fields (title, aliases, dates, custom keys). | yes — apply is opt-in |
ai_extract | Pull structured entities/facts into a caller-supplied JSON Schema, optionally onto frontmatter. | yes — apply is opt-in |
Every op is grounded — outputs include the paths (and where applicable,
block ids or headings) that the model used. No “trust the LLM” without
verifiable sources. Every op that can mutate the vault (ai_toc.write,
ai_tag.apply, ai_metadata.apply, ai_extract.apply) flows through the
standard atomic write pipeline (ETag, idempotency, audit) — there is no
AI-only side channel.
2. Compatibility with the existing configuration
Section titled “2. Compatibility with the existing configuration”This runtime is additive. It does not change any existing surface; it reuses
the same conventions that already govern vault_search and the embeddings
provider:
- Same provider plumbing. The chat/completion model uses the same provider
vocabulary as the embeddings provider (Vercel AI Gateway, OpenAI-compatible
endpoints including Ollama). See
SPEC §9.1. - Same response envelope. API responses use the existing
{ ok, data | error }envelope; CLI uses the existing JSON-on-stdout contract; MCP returnsstructuredContent+ a deterministictextsummary. - Same scope model.
vault:readis sufficient for AI reads. AI write paths (e.g., materializing a TOC note) requirevault:write— the AI runtime never writes through a side channel. - Same error codes. Errors map to existing codes plus three additions:
ai_unavailable,ai_quota_exceeded,ai_provider_error. See §9. - Same atomic-write pipeline. Materializing AI output as a note goes
through
core.vault.writeNotelike any other write — atomic, ETagged, idempotent. There is no “AI-only” write path. - Schema-first. All AI inputs/outputs are Zod schemas in
@indx/shared, mirrored to OpenAPI 3.1 and to MCP JSON Schema. SeeSPEC §14. - Zero outbound calls without consent. When no AI provider is configured
the runtime serves
503 ai_unavailablerather than ever calling out. SeePRD §9,NFR-PRIV-1.
If you have an indx server today, no env var, file, or token must be touched to keep it working: AI tools simply won’t appear in the catalog until a provider is configured.
3. Provider model
Section titled “3. Provider model”3.1 Two model roles
Section titled “3.1 Two model roles”| Role | Used by | Required for |
|---|---|---|
embeddings | Retrieval (semantic + hybrid) | semantic / hybrid search; ai_relate neighbor discovery; ai_summarize of multi-note scopes |
chat | Generation (summary text, answers, classifications) | ai_summarize, ai_ask, ai_toc, ai_relate (relationship typing) |
Either role can be unset. Behavior:
- No
embeddings, nochat→ AI runtime is off; tools are not advertised. embeddingsonly →ai_relate(vector neighbors) is advertised but classification falls back to a deterministic heuristic;ai_summarize/ai_ask/ai_tocreturn503 ai_unavailable.chatonly →ai_summarize/ai_ask/ai_tocwork for explicit-path scopes; query/tag scopes that need semantic recall fall back to lexical.- Both → all ops advertised at full fidelity.
3.2 Environment variables (additive only)
Section titled “3.2 Environment variables (additive only)”The existing embeddings vars are unchanged. The chat-model vars are new and
purely additive. If INDX_AI_PROVIDER is unset, indx falls back to
INDX_EMBEDDINGS_PROVIDER for the chat role too — most users will set one
provider and be done.
| Var | Default | Notes |
|---|---|---|
INDX_AI_PROVIDER | INDX_EMBEDDINGS_PROVIDER if set, else unset | vercel-ai-gateway | openai | ollama |
INDX_AI_MODEL | unset | Provider-specific id, e.g. anthropic/claude-haiku-4-5 |
INDX_AI_MAX_INPUT_TOKENS | 64000 | Hard cap on prompt context size |
INDX_AI_MAX_OUTPUT_TOKENS | 2048 | Hard cap on generated output |
INDX_AI_TEMPERATURE | 0 | Deterministic by default |
INDX_AI_DAILY_COST_USD | unset | Optional cost ceiling per day; over-cap → ai_quota_exceeded |
INDX_AI_ALLOW_GLOBS | unset | Comma-list of globs eligible for AI ops; default = all |
INDX_AI_DENY_GLOBS | unset | Takes precedence over allow; matches → exclude |
INDX_AI_CACHE | on | on | off; cache layer in .indx/ai-cache.db |
INDX_AI_TIMEOUT_MS | 30000 | Per-call upstream timeout |
Re-using INDX_AI_GATEWAY_KEY / INDX_OPENAI_BASE_URL / INDX_OPENAI_API_KEY
from SPEC §9.1 means the provider
credentials carry over from any existing embeddings setup.
3.3 Vault config (.indx/config.json)
Section titled “3.3 Vault config (.indx/config.json)”{ "ai": { "enabled": true, // false hides AI tools regardless of env "provider": "vercel-ai-gateway", // override env if needed "model": "anthropic/claude-haiku-4-5", "max_input_tokens": 64000, "max_output_tokens": 2048, "temperature": 0, "allow_globs": ["**/*.md"], "deny_globs": ["Private/**", "Journal/**"], "cache": { "enabled": true, "ttl_hours": 24 }, "daily_cost_usd": null }}Vault config wins over env for tuning knobs (temperature,
max_*_tokens, allow_globs, deny_globs); env wins for safety knobs
(enabled, provider credentials) so a sysadmin can lock the runtime down
even if the file says otherwise.
3.4 Provider abstraction (engine)
Section titled “3.4 Provider abstraction (engine)”interface ChatProvider { readonly model: string; generate(req: { system: string; messages: ChatMessage[]; json_schema?: JsonSchema; // structured output mode max_output_tokens?: number; temperature?: number; seed?: number; signal?: AbortSignal; }): Promise<{ text: string; structured?: unknown; usage: { input_tokens: number; output_tokens: number; cost_usd: number }; finish_reason: "stop" | "length" | "content_filter" | "error"; }>;}embeddings provider stays as defined in
IMPLEMENTATION §3 src/embeddings/provider.ts.
The factory returns null if not configured; AI ops branch on null exactly
the way search already does.
4. Common request shape
Section titled “4. Common request shape”Every AI op accepts a scope selecting what content to operate on. Scopes
are intersected with INDX_AI_ALLOW_GLOBS / INDX_AI_DENY_GLOBS; paths
excluded by globs are silently dropped from the candidate set and reported in
warnings.
type AiScope = | { kind: "paths"; paths: string[] } // explicit list | { kind: "glob"; path_glob: string } // path glob | { kind: "tag"; tag: string } // any note tagged | { kind: "query"; q: string; mode?: SearchMode; limit?: number; filters?: SearchFilters } | { kind: "note"; path: string; include_linked?: number }; // hop graph N stepsinclude_linked: N follows wikilinks N hops out from a single note — useful
for “summarize this note and its immediate neighbors”.
Common shared options:
| Field | Default | Notes |
|---|---|---|
language | inferred | Output language (en, ja, …); inferred from the dominant input language. |
style | neutral | neutral | bullets | executive | technical |
max_tokens | INDX_AI_MAX_OUTPUT_TOKENS | Per-call override (clamped to env max). |
seed | 0 | Same input + same seed → same output (best-effort; provider-dependent). |
cache | true | Disable per-call to bypass the AI cache. |
idempotency_key | random uuid | Same key + same body → cached response within 24 h. |
cite | true | Include citations[] in the output (always recommended). |
5. Operations
Section titled “5. Operations”5.1 ai_summarize
Section titled “5.1 ai_summarize”Generates a structured summary of the scope.
Input
type SummarizeInput = { scope: AiScope; style?: "neutral" | "bullets" | "executive" | "technical"; length?: "short" | "medium" | "long"; // ~50 / 200 / 600 tokens target language?: string; audience?: string; // e.g. "incident responder" include?: ("tldr" | "key_points" | "open_questions" | "outline")[]; max_tokens?: number; cite?: boolean; seed?: number; cache?: boolean;};Output
type SummarizeOutput = { scope_resolved: { paths: string[]; total_tokens: number }; summary: { tldr?: string; // 1–2 sentences text: string; // formatted markdown key_points?: string[]; open_questions?: string[]; outline?: { heading: string; line: number; path: string }[]; }; citations: Citation[]; usage: AiUsage; warnings: AiWarning[];};
type Citation = { path: string; etag: string; // pin the cited revision anchor?: string; // heading or block-id when available line?: number; span?: [number, number]; // [startLine, endLine] snippet?: string;};Behavior:
- If the scope expands to more notes than will fit in the prompt, the runtime performs map-reduce summarization: per-chunk summary → final summary. This is internal — callers see one structured result. Per-chunk usage is rolled up.
cite: true(default) emits at least one citation perkey_point. If the model fails to ground a key point, the runtime drops that point and adds aungrounded_droppedwarning rather than emit unsupported claims.
5.2 ai_ask
Section titled “5.2 ai_ask”RAG-style natural-language question answering grounded in the vault.
Input
type AskInput = { question: string; scope?: AiScope; // optional; default = whole vault retrieve?: { mode?: SearchMode; // 'lexical' | 'semantic' | 'hybrid' top_k?: number; // default 8 filters?: SearchFilters; }; style?: "neutral" | "bullets" | "executive" | "technical"; language?: string; max_tokens?: number; cite?: boolean; // default true conversation_id?: string; // for stateless multi-turn (server stores nothing) history?: { role: "user" | "assistant"; content: string }[]; stream?: boolean; // SSE/NDJSON token stream; see §7 seed?: number; cache?: boolean;};Output
type AskOutput = { answer: string; // markdown confidence: "high" | "medium" | "low" | "unknown"; citations: Citation[]; retrieved: { path: string; score: number; matched_in: ("title"|"body"|"tags"|"frontmatter")[]; }[]; followups?: string[]; // suggested next questions usage: AiUsage; warnings: AiWarning[];};Behavior:
- Retrieval defaults to
mode: "hybrid"if embeddings are configured, elsemode: "lexical"with anembeddings_unavailablewarning (mirrorsFR-S-4). - The model is instructed to answer only from the retrieved set. If the
retrieved evidence does not support an answer, output is
confidence: "low"withanswer: "I don't know based on the indexed vault."plusunsupported_questionwarning. The runtime does not synthesize content outside the corpus. conversation_idis opaque to the server — it is only used as a cache partition key. The server keeps no chat memory; agents passhistory.
5.3 ai_toc
Section titled “5.3 ai_toc”Two modes share one tool.
Input
type TocInput = | { mode: "note"; path: string; depth?: 1 | 2 | 3 | 4 | 5 | 6; // default 3 include_descriptions?: boolean; // default false include_links?: boolean; // default true style?: "compact" | "expanded"; max_tokens?: number; seed?: number; cache?: boolean } | { mode: "moc"; scope: AiScope; // Map-of-Content for many notes group_by?: "tag" | "folder" | "topic" | "frontmatter"; group_by_key?: string; // when "frontmatter" max_groups?: number; max_per_group?: number; include_summaries?: boolean; // 1-line per item title?: string; write?: { path: string; // optional materialization if_not_exists?: boolean; if_match?: string; frontmatter?: Record<string, unknown> } };Output
type TocOutput = { toc_markdown: string; // ready to drop in a note toc_tree: TocNode[]; // structured form scope_resolved: { paths: string[]; total_tokens: number }; written?: { path: string; etag: string }; // present only when write succeeded citations: Citation[]; usage: AiUsage; warnings: AiWarning[];};
type TocNode = { title: string; path?: string; // present in MOC mode anchor?: string; // heading slug for note mode description?: string; children?: TocNode[];};Behavior:
mode: "note"is deterministic without AI wheninclude_descriptions: false— it walks the note’s outline (already in the index). The chat model is only invoked to generate descriptions. This makes the common case (indx ai toc Spec.md) free of LLM cost.mode: "moc"clusters notes bygroup_byand asks the model to title each group + (optionally) annotate each item. Items per group are capped to keep output bounded.write: { path }materializes the output through the standardcore.notes.writepipeline (atomic, ETagged, emitsnote.created/note.updated). Withoutwrite, the op is read-only and requires onlyvault:read.
5.4 ai_relate
Section titled “5.4 ai_relate”Detects related notes and types the relationships.
Input
type RelateInput = { source: { kind: "path"; path: string } | { kind: "paths"; paths: string[] } | { kind: "scope"; scope: AiScope }; candidates?: AiScope; // restrict the candidate pool top_k?: number; // default 10 per source note retrieve_mode?: "semantic" | "lexical" | "hybrid"; // default 'hybrid' classify?: boolean; // default true if chat available relations?: ("extends" | "summarizes" | "cites" | "contradicts" | "same_topic" | "precedes" | "depends_on" | "duplicates")[]; // restrict the label set threshold?: number; // 0..1 confidence cutoff propose_links?: boolean; // default false max_tokens?: number; seed?: number; cache?: boolean;};Output
type RelateOutput = { edges: RelatedEdge[]; proposed_patches?: ProposedPatch[]; // only when propose_links: true scope_resolved: { sources: string[]; candidates_considered: number }; usage: AiUsage; warnings: AiWarning[];};
type RelatedEdge = { src_path: string; dst_path: string; similarity: number; // 0..1, vector similarity relation?: RelationLabel; // null if classify=false confidence?: number; // 0..1, present iff relation set rationale?: string; // 1–2 sentence justification evidence: Citation[];};
type ProposedPatch = { path: string; ops: PatchOp[]; // matches SPEC §6.2 reason: string; related_to: string[];};Behavior:
- Neighbor discovery uses the existing vector + lexical infra.
classify: truecalls the chat model in batches keyed bysrc_pathso rationale tokens stay deterministic per source. propose_links: truereturns draft patches. The runtime never applies them; the agent inspects and forwards tonote_patch. This keeps a human (or supervising agent) in the loop for graph mutations.- Edges are de-duplicated and sorted by
(confidence DESC, similarity DESC, dst_path ASC)for determinism.
5.5 ai_tag
Section titled “5.5 ai_tag”Suggests tags for one or more notes, biased toward the vault’s existing vocabulary so taxonomies don’t drift. Optionally applies the suggestions through the standard frontmatter write path.
Input
type TagInput = { scope: AiScope; // notes to tag vocabulary?: { use_existing?: boolean; // default true: load tag_list, prefer it allow_new?: boolean; // default true: model may invent tags candidates?: string[]; // restrict the model to this set forbid?: string[]; // never propose these namespace?: string; // operate only within #ns/... max_per_note?: number; // default 5 min_confidence?: number; // default 0.6 (drop below) }; apply?: boolean; // default false (suggest-only) apply_mode?: "merge" | "replace"; // default "merge" if_match?: Record<string, string>; // path → etag, applied as If-Match language?: string; max_tokens?: number; cite?: boolean; // default true seed?: number; cache?: boolean; idempotency_key?: string;};Output
type TagOutput = { suggestions: TagSuggestion[]; applied?: AppliedTagWrite[]; // present iff apply: true proposed_patches?: ProposedPatch[]; // present iff apply: false scope_resolved: { paths: string[]; total_tokens: number }; vocabulary: { existing: string[]; new: string[] }; citations: Citation[]; usage: AiUsage; warnings: AiWarning[];};
type TagSuggestion = { path: string; etag: string; // pinned read revision tags: { tag: string; // normalized, no leading '#' confidence: number; // 0..1 is_new: boolean; // not seen before in this vault rationale?: string; evidence: Citation[]; }[];};
type AppliedTagWrite = { path: string; etag: string; // new etag after the write tags_added: string[]; tags_removed: string[]; // only non-empty under apply_mode: "replace"};Behavior:
vocabulary.use_existing: true(default) loads the result oftag_listinto the prompt as the preferred set. Tags belowmin_confidenceare dropped before the model returns; if the user suppliescandidates, the output schema is constrained to that exact set (no new tags possible).- Tags are normalized: leading
#stripped, lowercased except where the vault’s existing tag is mixed-case (preserve canonical form), validated against the SPEC §4.5 charset. Invalid candidates are dropped with atag_invalidwarning. apply: truewrites viacore.notes.patch(actor, { ops: [{ op: "set_frontmatter", key: "tags", value: <merged_or_replaced> }] })per path. Each path’s write honors its ownif_match[path]if supplied.apply_mode: "merge"(default) dedup-unions with existing tags;"replace"overwrites. Inline#tagmentions in note bodies are never rewritten — body text is the user’s; only frontmatter is managed.- Required scope:
vault:readfor suggest mode,vault:writeforapply: true.
5.6 ai_metadata
Section titled “5.6 ai_metadata”Generates or refines typed frontmatter fields. Each requested field has a schema; the model is constrained via JSON Schema mode and the runtime re-validates before returning or applying.
Input
type MetadataInput = { scope: AiScope; fields: MetadataFieldSpec[]; // which keys to fill, with types apply?: boolean; // default false apply_mode?: "set_missing" | "overwrite" | "merge"; // default "set_missing" if_match?: Record<string, string>; language?: string; max_tokens?: number; cite?: boolean; // default true seed?: number; cache?: boolean; idempotency_key?: string;};
type MetadataFieldSpec = { key: string; // frontmatter key (e.g. "title", "due_date", "status") type: "string" | "number" | "boolean" | "date" | "datetime" | "list" | "enum"; description?: string; // hint to the model required?: boolean; // when true, missing field → metadata_missing warning enum?: string[]; // for type === "enum" list_item_type?: MetadataFieldSpec["type"]; pattern?: string; // regex (string only) min?: number; max?: number; // numeric / list length bounds default?: unknown; // applied when model returns null};Output
type MetadataOutput = { results: MetadataResult[]; applied?: AppliedMetadataWrite[]; proposed_patches?: ProposedPatch[]; scope_resolved: { paths: string[]; total_tokens: number }; citations: Citation[]; usage: AiUsage; warnings: AiWarning[];};
type MetadataResult = { path: string; etag: string; fields: Record<string, { value: unknown; // schema-validated; null when not extractable confidence: number; // 0..1 rationale?: string; existing?: unknown; // current frontmatter value (for diffs) evidence: Citation[]; }>;};
type AppliedMetadataWrite = { path: string; etag: string; fields_set: string[]; fields_skipped: { key: string; reason: "existing_under_set_missing" | "validation_failed" }[];};Behavior:
- The model is constrained to a
json_schemaderived fromfields[]. After generation the runtime re-validates: anything failing is dropped with ametadata_invalidwarning carrying{ path, key, reason }. apply_mode:set_missing(default) — only writes keys whose existing value isnull/missing; protects user-curated fields.overwrite— replaces values regardless of current state.merge— fortype: "list"only, dedup-unions with the existing value; for scalars, behaves likeset_missing.
- Type semantics:
date/datetimeoutputs are normalized to ISO 8601 (YYYY-MM-DDorYYYY-MM-DDTHH:mm:ssZ). Model output that is unambiguous prose (“yesterday”) fails validation rather than guessing.enumis exact-match only; near-misses →metadata_invalid.patternis enforced server-side after generation, not relayed via JSON Schema (provider compatibility is uneven).
- Special keys recognized for SPEC §4.1
conventions: writes to
tags,aliases,cssclassesuse the Obsidian-conformant list shape regardless of the field’s declaredtype. - Required scope:
vault:readfor suggest mode,vault:writeforapply: true.
5.7 ai_extract
Section titled “5.7 ai_extract”Generic extraction into a caller-supplied JSON Schema. Use this when neither
“a few typed frontmatter fields” (ai_metadata) nor “tags” (ai_tag) fits —
e.g., pulling action items, decisions, contacts, or domain entities out of a
note for downstream processing.
Input
type ExtractInput = { scope: AiScope; schema: JsonSchema; // JSON Schema 2020-12 subset schema_id?: string; // optional human label, included in cache key destination?: "json" | "frontmatter"; // default "json" — return only destination_key?: string; // required when destination === "frontmatter" apply?: boolean; // default false apply_mode?: "set_missing" | "overwrite" | "merge"; // for destination: "frontmatter" if_match?: Record<string, string>; language?: string; max_tokens?: number; cite?: boolean; // default true seed?: number; cache?: boolean; idempotency_key?: string;};Output
type ExtractOutput = { results: ExtractResult[]; applied?: AppliedExtractWrite[]; proposed_patches?: ProposedPatch[]; scope_resolved: { paths: string[]; total_tokens: number }; usage: AiUsage; warnings: AiWarning[];};
type ExtractResult = { path: string; etag: string; data: unknown; // validated against `schema` confidence: number; // 0..1 citations: Citation[]; // per-field where the model can attribute};
type AppliedExtractWrite = { path: string; etag: string; key: string; // frontmatter key written};Behavior:
- The runtime constrains generation with
schema(JSON Schema mode); after generation it re-validates with the same schema and drops items that fail withextract_invalid. destination: "json"(default) is read-only — the agent gets data back and decides what to do with it.destination: "frontmatter"requiresdestination_keyandapply: trueto actually write; the data is set as the value of that key throughnotes.patch. Withoutapply: true, drafts come back viaproposed_patchesexactly likeai_relate.- The full
schema(orschema_idif supplied) participates in the cache key, so changing the requested shape invalidates cleanly. - Required scope:
vault:readfor suggest mode,vault:writeforapply: truewithdestination: "frontmatter".
5.8 Apply semantics for multi-note ops (tag, metadata, extract)
Section titled “5.8 Apply semantics for multi-note ops (tag, metadata, extract)”ai_tag, ai_metadata, and ai_extract operate on potentially many notes
at once. Their apply: true semantics are deliberately conservative:
- Per-note atomicity. Each note is written through one
core.notes.patch(actor, …)call; that call is itself atomic (SPEC §6.1). A crash between notes leaves the already-written notes consistent and the rest untouched. - Best-effort batch, never partial-mid-note. The runtime applies notes
in path-sorted order. A failure on one note does not roll back
earlier notes (we cannot atomically transact across files). The response
reports each path’s outcome via
applied[]and any failures viawarnings[]with codes (etag_mismatch,forbidden,validation_failed). - Per-path optimistic concurrency.
if_match: { "<path>": "<etag>" }is honored per note; mismatches surface asai_apply_conflictwithdetails.failed[]and the op stops at the first mismatch unlessif_match_mode: "skip"is set (the runtime skips conflicting paths and continues). - Idempotency-by-content. Repeating the same
applyrequest after a successful run is a safe no-op: underapply_mode: "set_missing"(andmergefor lists), nothing changes; the response carriesapply_skipped_no_changeper path. Combine withIdempotency-Keyto short-circuit to the cached response within 24 h. - Read-after-write. Each
applied[]entry includes the new etag of the note — agents do not need a follow-upnote_read. - Audit fan-out. Every per-note write emits a normal
note.updatedevent with the existingActor(api/cli/mcp); a single roll-upai.invocationevent lists allapplied_pathsfor the call. - Glob gating still wins. If
INDX_AI_DENY_GLOBScovers a path, that note is excluded before the chat call and reported viaglobs_excluded— no provider sees its content, no write happens.
6. Citations and grounding
Section titled “6. Citations and grounding”Every AI output that references vault content carries a citations[] array
of Citation records:
pathis always vault-relative POSIX.etagpins the content version the model saw — agents can detect drift by comparing againstnote_read({ path }).etag.anchoris a heading or block id when one applies; otherwise omitted.line/spangive exact location for snippet pull.
Grounding rules:
- The chat model is constrained to JSON Schema output (
json_schemamode) wherever supported. The schema mandates acitationsfield. - After the model returns, the runtime verifies every cited path exists,
the etag matches, and (if anchor or line is given) that the location
resolves. Failed citations are dropped with a
citation_driftorcitation_invalidwarning per item. - When
cite: trueand zero citations survive, the op fails withai_grounding_failedrather than returning ungrounded prose.
7. Streaming
Section titled “7. Streaming”Long-running ops (notably ai_ask and large ai_summarize) support
streaming. Transport-specific:
| Surface | Streaming form |
|---|---|
| HTTP API | Accept: text/event-stream. Events: ai.partial (token chunk), ai.citation (incremental), ai.usage (running total), ai.complete (final structured payload). |
| CLI | --stream writes NDJSON deltas to stdout, matching the SSE event names; final line is {"type":"ai.complete", ...}. |
| MCP | Tool calls emit progress notifications (notifications/progress) with partial payloads; the final tool_result carries the full structuredContent. |
Streaming is best-effort. If the provider does not support streaming, the
runtime falls back to a single non-streamed response and emits
stream_unsupported in warnings (no error).
8. Caching, idempotency, cost
Section titled “8. Caching, idempotency, cost”8.1 AI cache
Section titled “8.1 AI cache”When INDX_AI_CACHE=on (default), every successful AI op result is cached in
.indx/ai-cache.db keyed by:
sha256( op | provider | model | temperature | seed | max_output_tokens | json_schema | normalized_input | content_fingerprint(scope))normalized_inputstrips fields that don’t affect output (e.g.idempotency_key,cache,stream).content_fingerprint(scope)is the sorted list of(path, etag)pairs in the resolved scope. Any vault edit invalidates only the entries that read the changed file.- TTL: 24 h (configurable per call via
cache_ttl_seconds). - Cache hits return with
usage: { input_tokens: 0, output_tokens: 0, cost_usd: 0 }andwarnings: ["ai_cache_hit"].
8.2 Idempotency
Section titled “8.2 Idempotency”The standard Idempotency-Key header / --idempotency-key flag /
idempotency_key field works as on every other write — but for AI it also
applies to read-shaped ops, so an agent retrying a /v1/ai/ask call after
a network blip does not pay for two generations.
Same key + same body within 24 h → cached response with
X-Indx-Idempotent-Replay: true. Same key + different body → 409 idempotency_key_reused.
8.3 Cost ceilings
Section titled “8.3 Cost ceilings”INDX_AI_DAILY_COST_USDclamps total spend per UTC day across all tokens. Over-cap →429 ai_quota_exceededwithRetry-After: <seconds-until-midnight>.- Per-token rate limits from the existing API limiter still apply.
- Each AI call’s
usage.cost_usdis a best-effort number derived from the provider’s published rates — emitted on the response, written toevents.logas aai.invocationevent, and aggregated forvault_status.ai.
9. Errors
Section titled “9. Errors”AI ops use the standard error envelope (API §4).
Three new codes layer on the existing list:
| HTTP | code | When |
|---|---|---|
| 503 | ai_unavailable | No suitable provider configured (or enabled: false in config) for the requested op. details.required_role is "chat" or "embeddings". |
| 429 | ai_quota_exceeded | Daily cost ceiling reached. details.reset_at is an ISO 8601 instant. |
| 502 | ai_provider_error | Upstream call failed (timeout, 5xx, parse). details.provider, details.upstream_status, details.upstream_message. |
| 422 | ai_grounding_failed | All model citations failed verification under cite: true. details.dropped[]. |
| 422 | ai_input_too_large | Resolved scope exceeds INDX_AI_MAX_INPUT_TOKENS even after map-reduce. details.tokens. |
| 422 | ai_schema_invalid | The caller-supplied ai_extract.schema is not a valid JSON Schema. details.errors[]. |
| 409 | ai_apply_conflict | One or more per-path if_match checks failed during apply: true; details.failed[{ path, expected, actual }]. No partial writes leak — see §5.5/§5.6/§5.7. |
Existing codes also apply: validation_failed, forbidden,
unauthorized, rate_limited, parse_failed, etag_mismatch (when
ai_toc.write collides), etc.
AiWarning (non-fatal) values:
| Warning | Meaning |
|---|---|
embeddings_unavailable | Hybrid/semantic retrieval downgraded to lexical. |
chat_unavailable | Chat-dependent fields omitted (e.g. ai_relate without classification). |
ai_cache_hit | Result served from cache; usage reflects zero cost. |
citation_drift | A cited path exists but its etag changed since model generation. |
citation_invalid | A cited path or anchor does not resolve and was dropped. |
ungrounded_dropped | A model claim without citation was dropped under cite: true. |
truncated | Output hit max_output_tokens; finish_reason: "length". |
stream_unsupported | Streaming requested but provider does not support it. |
globs_excluded | Some scope candidates were excluded by INDX_AI_*_GLOBS. |
unsupported_question | ai_ask could not ground an answer; returned low-confidence stub. |
tag_invalid | A proposed tag failed SPEC §4.5 charset/shape rules and was dropped. |
tag_below_threshold | A proposed tag scored under vocabulary.min_confidence and was dropped. |
metadata_invalid | A field value failed schema validation (pattern / enum / min/max / type) and was dropped. |
metadata_missing | A required: true field could not be extracted and was omitted (still an op success). |
extract_invalid | An extracted record failed schema validation and was dropped. |
apply_skipped_no_change | apply: true was a no-op for a path because nothing changed (e.g. all suggested tags already present under apply_mode: "merge"). |
10. Surface mapping
Section titled “10. Surface mapping”AI op ↔ HTTP ↔ CLI ↔ MCP tool──────────────────────────────────────────────────────────────────────────────────────ai_summarize POST /v1/ai/summarize indx ai summarize ai_summarizeai_ask POST /v1/ai/ask indx ai ask ai_askai_toc POST /v1/ai/toc indx ai toc ai_tocai_relate POST /v1/ai/relate indx ai relate ai_relateai_tag POST /v1/ai/tag indx ai tag ai_tagai_metadata POST /v1/ai/metadata indx ai metadata ai_metadataai_extract POST /v1/ai/extract indx ai extract ai_extractai_status GET /v1/ai/status indx ai status ai_statusai_status is a cheap probe returning { enabled, provider, model, embeddings: { provider, model, dim }, cache: { enabled, hits_24h, size_bytes }, spend_today_usd } — the agent’s discovery handshake for “what AI can I expect
from this vault?“
10.1 HTTP details
Section titled “10.1 HTTP details”- All AI endpoints are
POST(request bodies are non-trivial; cursors andIdempotency-Keybelong on writes). - All AI endpoints stream when
Accept: text/event-stream; otherwise return a single JSON envelope. ai_statusisGETand behaves like any other lightweight read.
10.2 CLI details
Section titled “10.2 CLI details”indx ai summarize --scope-paths Spec.md,API.md [--style bullets] [--length short]indx ai ask "what does the etag scheme guarantee?" [--top-k 8] [--mode hybrid]indx ai toc --note Architecture.md --depth 3 [--descriptions]indx ai toc --moc --scope-glob 'Projects/**' --group-by tag --write Index.mdindx ai relate --note Spec.md [--top-k 10] [--classify] [--propose-links]indx ai statusGlobal flags (CLI §2.1) apply: --ndjson,
--idempotency-key, --if-match, --dry-run. --stream opts into NDJSON
streaming (one event per line, terminated by ai.complete).
indx ai toc --moc --write <path> requires --yes if the path exists and
will be replaced; --if-not-exists makes it a safe create.
10.3 MCP details
Section titled “10.3 MCP details”Tools are registered iff a provider is configured and the connecting token
has vault:read (and vault:write for the write paths of ai_toc).
Tool names are stable; they appear in
MCP §3 tables (read tools, except ai_toc with
write which is a write tool).
Each AI tool result includes a deterministic content[].text summary,
exactly the contract from MCP §3: an agent that wants
to display progress without re-rendering the structured payload can quote
the text block.
Resources gain two read-only synthetic URIs when AI is enabled:
ai://summary/<scope-spec> a live summary; refreshes when scope content changesai://moc/<scope-spec> a live MOC for a glob/tag scope<scope-spec> is a URL-encoded JSON AiScope. Subscribing to either URI
(resources/subscribe) re-derives when an underlying note’s etag changes —
the runtime triggers a re-summary, and the agent receives
notifications/resources/updated.
11. Worked examples
Section titled “11. Worked examples”11.1 Summarize a folder, bullets, short
Section titled “11.1 Summarize a folder, bullets, short”POST /v1/ai/summarizeAuthorization: Bearer indx_…Content-Type: application/json
{ "scope": { "kind": "glob", "path_glob": "Specs/**/*.md" }, "style": "bullets", "length": "short", "include": ["tldr", "key_points", "open_questions"], "cite": true}Response (abridged):
{ "ok": true, "data": { "scope_resolved": { "paths": ["Specs/Auth.md", "Specs/Search.md"], "total_tokens": 4123 }, "summary": { "tldr": "Auth uses static bearer tokens with scopes; search defaults to lexical.", "key_points": [ "Tokens are validated by hashed comparison; secrets never persisted in config.json.", "Hybrid search requires an embeddings provider and degrades silently to lexical otherwise." ], "open_questions": ["Should per-user OIDC ship in v1.1 or v2?"] }, "citations": [ { "path": "Specs/Auth.md", "etag": "ab12cd34ef567890", "line": 42 }, { "path": "Specs/Search.md", "etag": "9fbb12…", "anchor": "5.2 Search modes" } ], "usage": { "input_tokens": 4123, "output_tokens": 218, "cost_usd": 0.0007 }, "warnings": [] }}11.2 Ask, streaming
Section titled “11.2 Ask, streaming”indx ai ask "what does the etag scheme guarantee for two concurrent writers?" \ --top-k 8 --streamStdout, one NDJSON event per line:
{"type":"ai.partial","delta":"The ETag is the first 16 hex of"}{"type":"ai.partial","delta":" xxhash64(bytes-on-disk), so it changes…"}{"type":"ai.citation","citation":{"path":"SPEC.md","etag":"…","anchor":"6.1 Atomic write"}}{"type":"ai.usage","usage":{"input_tokens":3211,"output_tokens":104,"cost_usd":0.00041}}{"type":"ai.complete","data":{ /* full AskOutput */ }}Exit 0 on a normal complete. Exit 7 on ai_quota_exceeded. Exit 8 on
ai_provider_error.
11.3 TOC for a single long note (no LLM cost)
Section titled “11.3 TOC for a single long note (no LLM cost)”indx ai toc --note Architecture.md --depth 3include_descriptions: false (default) → outline-only, skips the chat
model entirely. usage.cost_usd is 0.
11.4 MOC for “Projects/”, grouped by tag, materialized
Section titled “11.4 MOC for “Projects/”, grouped by tag, materialized”indx ai toc --moc \ --scope-glob 'Projects/**' \ --group-by tag \ --write Index/Projects.md \ --idempotency-key $(uuidgen)Response includes written: { path: "Index/Projects.md", etag: "…" }. The
write goes through the standard atomic pipeline; a subsequent
note_read sees the materialized note. Re-running the same command with
the same idempotency_key is a no-op replay.
11.5 Relate + propose links
Section titled “11.5 Relate + propose links”POST /v1/ai/relate{ "source": { "kind": "path", "path": "Architecture.md" }, "candidates": { "kind": "glob", "path_glob": "**/*.md" }, "top_k": 8, "classify": true, "threshold": 0.55, "propose_links": true}{ "ok": true, "data": { "edges": [ { "src_path": "Architecture.md", "dst_path": "SPEC.md", "similarity": 0.81, "relation": "extends", "confidence": 0.78, "rationale": "SPEC.md sets the contracts that Architecture.md implements.", "evidence": [{ "path": "SPEC.md", "etag": "…", "anchor": "6.2 Patch operations" }] } ], "proposed_patches": [ { "path": "Architecture.md", "ops": [{ "op": "insert_after_heading", "heading": "## See also", "markdown": "- [[SPEC]] — patch grammar reference\n" }], "reason": "Strong same-topic + extends relation; no existing wikilink found.", "related_to": ["SPEC.md"] } ], "scope_resolved": { "sources": ["Architecture.md"], "candidates_considered": 184 }, "usage": { "input_tokens": 6420, "output_tokens": 312, "cost_usd": 0.0011 }, "warnings": [] }}The agent decides whether to apply the proposed patches by calling
note_patch — the runtime never auto-edits.
11.6 Tag a recently-edited folder, biased to existing vocabulary
Section titled “11.6 Tag a recently-edited folder, biased to existing vocabulary”indx ai tag \ --scope-glob 'Inbox/**/*.md' \ --use-existing --max-per-note 4 --min-confidence 0.7 \ --apply --apply-mode merge \ --idempotency-key $(uuidgen){ "ok": true, "data": { "suggestions": [ { "path": "Inbox/2026-05-02-meeting.md", "etag": "ab12…", "tags": [ { "tag": "ops", "confidence": 0.92, "is_new": false, "rationale": "incident review under #ops cadence", "evidence": [{ "path": "Inbox/2026-05-02-meeting.md", "etag": "ab12…", "anchor": "## Agenda" }] }, { "tag": "oncall/postmortem", "confidence": 0.81, "is_new": true, "rationale": "explicit postmortem section present", "evidence": [{ "path": "Inbox/2026-05-02-meeting.md", "etag": "ab12…", "anchor": "## Postmortem" }] } ] } ], "applied": [ { "path": "Inbox/2026-05-02-meeting.md", "etag": "cd34…", "tags_added": ["ops", "oncall/postmortem"], "tags_removed": [] } ], "vocabulary": { "existing": ["ops"], "new": ["oncall/postmortem"] }, "citations": [/* … */], "usage": { "input_tokens": 1820, "output_tokens": 64, "cost_usd": 0.0002 }, "warnings": [] }}11.7 Fill missing typed frontmatter fields
Section titled “11.7 Fill missing typed frontmatter fields”POST /v1/ai/metadata{ "scope": { "kind": "paths", "paths": ["Projects/Indx-App.md"] }, "fields": [ { "key": "title", "type": "string", "required": true }, { "key": "status", "type": "enum", "enum": ["idea","active","blocked","done"], "required": true }, { "key": "due_date", "type": "date" }, { "key": "stakeholders","type": "list", "list_item_type": "string", "max": 5 } ], "apply": true, "apply_mode": "set_missing"}{ "ok": true, "data": { "results": [{ "path": "Projects/Indx-App.md", "etag": "ab12…", "fields": { "title": { "value": "Indx-App build plan", "confidence": 0.97, "evidence": [/* … */] }, "status": { "value": "active", "confidence": 0.86, "evidence": [/* … */] }, "due_date": { "value": "2026-09-01", "confidence": 0.71, "evidence": [/* … */] }, "stakeholders": { "value": ["Alice","Bob"], "confidence": 0.79, "evidence": [/* … */] } } }], "applied": [{ "path": "Projects/Indx-App.md", "etag": "cd34…", "fields_set": ["title","status","due_date","stakeholders"], "fields_skipped": [] }], "usage": { "input_tokens": 2410, "output_tokens": 88, "cost_usd": 0.0003 }, "warnings": [] }}A second run with the same body is a no-op replay (everything already set
under set_missing); the response carries apply_skipped_no_change.
11.8 Extract action items into a custom JSON Schema
Section titled “11.8 Extract action items into a custom JSON Schema”indx ai extract \ --scope-glob 'Meetings/2026-Q2/**' \ --schema-file action-items.schema.json \ --destination frontmatter \ --destination-key action_items \ --apply --apply-mode mergeaction-items.schema.json:
{ "type": "array", "items": { "type": "object", "required": ["task", "owner"], "properties": { "task": { "type": "string", "minLength": 3 }, "owner": { "type": "string" }, "due_date": { "type": "string", "format": "date" }, "blocked_by": { "type": "array", "items": { "type": "string" } } }, "additionalProperties": false }}Each meeting note ends up with a typed action_items: array in
frontmatter — usable directly by Bases queries (SPEC §4.9) without
further parsing.
12. Telemetry, audit, and the events stream
Section titled “12. Telemetry, audit, and the events stream”Every AI invocation emits a VaultEvent of type ai.invocation (extends
SPEC §7):
type AiInvocationEvent = { type: "ai.invocation"; op: "summarize" | "ask" | "toc" | "relate" | "tag" | "metadata" | "extract"; actor: Actor; // ui/api/cli/mcp/fs scope_paths: string[]; // resolved (truncated to first 50 for log) provider: string; model: string; usage: AiUsage; cache_hit: boolean; duration_ms: number; ok: boolean; error_code?: string; applied_paths?: string[]; // present iff apply: true succeeded; truncated to first 50 at: string; // ISO 8601};These events flow through:
GET /v1/events(SSE) — filterable withkinds=ai.invocation..indx/events.log— the rolling NDJSON audit log.vault_status.ai— counts, hits/misses, total spend today.
Privacy: prompts and outputs are not written to the event log. Paths,
counts, durations, costs, and error codes only — same redaction stance as
NFR-PRIV-2.
13. Determinism and reproducibility
Section titled “13. Determinism and reproducibility”AI is non-deterministic by nature; indx narrows the gap:
- Defaults:
temperature: 0,seed: 0, structured-output JSON Schema mode. - The shape of every AI output is fixed by Zod schemas; freeform prose is
contained to specific fields (
summary.text,answer,rationale). - Scope resolution and citation verification are deterministic given the
vault state — agents that rely on
scope_resolved.pathsfor downstream branching get stable inputs. - The cache (§8.1) makes “same input + same vault state” a guaranteed exact replay across restarts.
This is enough for snapshot-style tests in CI: the runtime ships a
AI_TEST_RECORD=1 mode that records cache fixtures, and AI_TEST_REPLAY=1
serves them — no provider call required for unit/integration tests of
adapters.
14. Security & privacy considerations
Section titled “14. Security & privacy considerations”- No prompt leakage in logs. Same redaction rules as the rest of the system; only paths, counts, and durations are persisted.
- Glob-gated content.
INDX_AI_DENY_GLOBSlets the operator keep sensitive paths (Private/**,Journal/**, …) out of any AI op even when an agent specifies them inscope. Filtered candidates surface asglobs_excludedwarnings — never silently included. - Outbound calls only with consent. AI features tighten, not loosen,
NFR-PRIV-1: if neither chat nor embeddings provider is configured, no call is ever made. - Scopes still gate writes.
ai_tocwithwriterequiresvault:write;ai_relatewithpropose_links: truereturns drafts only — applying requiresnote_patch, which requiresvault:write. There is no AI bypass. - Threat model. A malicious agent with
vault:readcan exfiltrate vault content via the configured provider. This is the same trust posture as any embeddings call today; document it clearly and let users choose providers (orai.enabled: false) accordingly. SeeSPEC §12.
15. Versioning
Section titled “15. Versioning”- AI op surface is additive within
/v1. New optional fields and warnings are non-breaking; new ops appear in/openapi.jsononce configured. - Breaking shape changes to existing ops follow the standard
API §15policy:/v2cut,/v1retained for one minor. - Provider plumbing is engine-internal — swapping a provider does not change the public surface.
16. What this design buys the agent
Section titled “16. What this design buys the agent”- One handshake for AI capability discovery —
ai_status(ortools/listfiltered forai_*) tells the agent exactly what it can ask for on this vault. - Citations by default — every model claim is verifiable against the vault, with etag-pinned references the agent can re-fetch.
- Deterministic shapes — even when the model varies the prose, the structured payload is contractual.
- Safe writes — the runtime never auto-edits; AI write paths reuse the atomic pipeline with the same ETag/idempotency invariants.
- Cost-aware — usage is reported per call, capped per day, and cached between identical requests so retries are free.
- Drop-in compatibility — existing tokens, scopes, configs, and clients keep working; AI is one env var away.