AI Runtime

Built-in AI runtime — summarize, ask, toc, relate, tag, metadata, extract — over CLI, API, MCP.

Status: Draft v0.1 · 2026-05-03 Companion docs: PRD.md, SPEC.md, API.md, CLI.md, MCP.md Audience: AI agents (primary), self-hosters, library authors.

The vault is the truth. AI is a derived capability layered on top. Indx ships a small, opinionated AI runtime so agents can reach for summary, search-grounded answers, table-of-contents creation, and relationship detection without wiring their own retrieval pipeline. Every operation is reachable identically from the CLI, the HTTP API, and MCP.

1. Why a built-in AI runtime

Two failure modes shaped this design:

Roll-your-own RAG against a vault is brittle. Every new agent reinvents retrieval, snippet extraction, citation formatting, error fallback, and provider plumbing. The vault is right there; indx already indexes it. The runtime should expose ready-made AI ops the same way it exposes note_read — one tool call, one structured result.
AI features that aren’t optional are a tax. A user with an air-gapped self-host or no AI key SHALL still get a fully functional vault. AI ops are opt-in: they are advertised iff a provider is configured, and they degrade explicitly (never silently) when retrieval is the only thing available.

The runtime adds seven operations on top of the existing surfaces:

Op	What it does	Read-only by default?
`ai_summarize`	Summarize one note, many notes, or a query/tag scope.	yes
`ai_ask`	Natural-language question answered from vault content with citations.	yes
`ai_toc`	Build a table of contents (single note) or a Map-of-Content (folder/glob/tag).	yes — `write` is opt-in
`ai_relate`	Detect relationships between notes (related, extends, cites, contradicts, …).	yes — `propose_links` returns drafts only
`ai_tag`	Suggest tags for notes, biased toward the existing vault vocabulary.	yes — `apply` is opt-in
`ai_metadata`	Fill or refine typed frontmatter fields (title, aliases, dates, custom keys).	yes — `apply` is opt-in
`ai_extract`	Pull structured entities/facts into a caller-supplied JSON Schema, optionally onto frontmatter.	yes — `apply` is opt-in

Every op is grounded — outputs include the paths (and where applicable, block ids or headings) that the model used. No “trust the LLM” without verifiable sources. Every op that can mutate the vault (ai_toc.write, ai_tag.apply, ai_metadata.apply, ai_extract.apply) flows through the standard atomic write pipeline (ETag, idempotency, audit) — there is no AI-only side channel.

2. Compatibility with the existing configuration

This runtime is additive. It does not change any existing surface; it reuses the same conventions that already govern vault_search and the embeddings provider:

Same provider plumbing. The chat/completion model uses the same provider vocabulary as the embeddings provider (Vercel AI Gateway, OpenAI-compatible endpoints including Ollama). See SPEC §9.1.
Same response envelope. API responses use the existing { ok, data | error } envelope; CLI uses the existing JSON-on-stdout contract; MCP returns structuredContent + a deterministic text summary.
Same scope model. vault:read is sufficient for AI reads. AI write paths (e.g., materializing a TOC note) require vault:write — the AI runtime never writes through a side channel.
Same error codes. Errors map to existing codes plus three additions: ai_unavailable, ai_quota_exceeded, ai_provider_error. See §9.
Same atomic-write pipeline. Materializing AI output as a note goes through core.vault.writeNote like any other write — atomic, ETagged, idempotent. There is no “AI-only” write path.
Schema-first. All AI inputs/outputs are Zod schemas in @indx/shared, mirrored to OpenAPI 3.1 and to MCP JSON Schema. See SPEC §14.
Zero outbound calls without consent. When no AI provider is configured the runtime serves 503 ai_unavailable rather than ever calling out. See PRD §9, NFR-PRIV-1.

If you have an indx server today, no env var, file, or token must be touched to keep it working: AI tools simply won’t appear in the catalog until a provider is configured.

3. Provider model

3.1 Two model roles

Role	Used by	Required for
`embeddings`	Retrieval (semantic + hybrid)	semantic / hybrid search; `ai_relate` neighbor discovery; `ai_summarize` of multi-note scopes
`chat`	Generation (summary text, answers, classifications)	`ai_summarize`, `ai_ask`, `ai_toc`, `ai_relate` (relationship typing)

Either role can be unset. Behavior:

No embeddings, no chat → AI runtime is off; tools are not advertised.
embeddings only → ai_relate (vector neighbors) is advertised but classification falls back to a deterministic heuristic; ai_summarize / ai_ask / ai_toc return 503 ai_unavailable.
chat only → ai_summarize / ai_ask / ai_toc work for explicit-path scopes; query/tag scopes that need semantic recall fall back to lexical.
Both → all ops advertised at full fidelity.

3.2 Environment variables (additive only)

The existing embeddings vars are unchanged. The chat-model vars are new and purely additive. If INDX_AI_PROVIDER is unset, indx falls back to INDX_EMBEDDINGS_PROVIDER for the chat role too — most users will set one provider and be done.

Var	Default	Notes
`INDX_AI_PROVIDER`	`INDX_EMBEDDINGS_PROVIDER` if set, else unset	`vercel-ai-gateway` \| `openai` \| `ollama`
`INDX_AI_MODEL`	unset	Provider-specific id, e.g. `anthropic/claude-haiku-4-5`
`INDX_AI_MAX_INPUT_TOKENS`	`64000`	Hard cap on prompt context size
`INDX_AI_MAX_OUTPUT_TOKENS`	`2048`	Hard cap on generated output
`INDX_AI_TEMPERATURE`	`0`	Deterministic by default
`INDX_AI_DAILY_COST_USD`	unset	Optional cost ceiling per day; over-cap → `ai_quota_exceeded`
`INDX_AI_ALLOW_GLOBS`	unset	Comma-list of globs eligible for AI ops; default = all
`INDX_AI_DENY_GLOBS`	unset	Takes precedence over allow; matches → exclude
`INDX_AI_CACHE`	`on`	`on` \| `off`; cache layer in `.indx/ai-cache.db`
`INDX_AI_TIMEOUT_MS`	`30000`	Per-call upstream timeout

Re-using INDX_AI_GATEWAY_KEY / INDX_OPENAI_BASE_URL / INDX_OPENAI_API_KEY from SPEC §9.1 means the provider credentials carry over from any existing embeddings setup.

3.3 Vault config (`.indx/config.json`)

{
  "ai": {
    "enabled": true,                       // false hides AI tools regardless of env
    "provider": "vercel-ai-gateway",       // override env if needed
    "model": "anthropic/claude-haiku-4-5",
    "max_input_tokens": 64000,
    "max_output_tokens": 2048,
    "temperature": 0,
    "allow_globs": ["**/*.md"],
    "deny_globs": ["Private/**", "Journal/**"],
    "cache": { "enabled": true, "ttl_hours": 24 },
    "daily_cost_usd": null
  }
}

Vault config wins over env for tuning knobs (temperature, max_*_tokens, allow_globs, deny_globs); env wins for safety knobs (enabled, provider credentials) so a sysadmin can lock the runtime down even if the file says otherwise.

3.4 Provider abstraction (engine)

interface ChatProvider {
  readonly model: string;
  generate(req: {
    system: string;
    messages: ChatMessage[];
    json_schema?: JsonSchema;     // structured output mode
    max_output_tokens?: number;
    temperature?: number;
    seed?: number;
    signal?: AbortSignal;
  }): Promise<{
    text: string;
    structured?: unknown;
    usage: { input_tokens: number; output_tokens: number; cost_usd: number };
    finish_reason: "stop" | "length" | "content_filter" | "error";
  }>;
}

embeddings provider stays as defined in IMPLEMENTATION §3 src/embeddings/provider.ts. The factory returns null if not configured; AI ops branch on null exactly the way search already does.

4. Common request shape

Every AI op accepts a scope selecting what content to operate on. Scopes are intersected with INDX_AI_ALLOW_GLOBS / INDX_AI_DENY_GLOBS; paths excluded by globs are silently dropped from the candidate set and reported in warnings.

type AiScope =
  | { kind: "paths";  paths: string[] }                    // explicit list
  | { kind: "glob";   path_glob: string }                  // path glob
  | { kind: "tag";    tag: string }                        // any note tagged
  | { kind: "query";  q: string; mode?: SearchMode;
                      limit?: number; filters?: SearchFilters }
  | { kind: "note";   path: string; include_linked?: number };  // hop graph N steps

include_linked: N follows wikilinks N hops out from a single note — useful for “summarize this note and its immediate neighbors”.

Common shared options:

Field	Default	Notes
`language`	inferred	Output language (`en`, `ja`, …); inferred from the dominant input language.
`style`	`neutral`	`neutral` \| `bullets` \| `executive` \| `technical`
`max_tokens`	`INDX_AI_MAX_OUTPUT_TOKENS`	Per-call override (clamped to env max).
`seed`	`0`	Same input + same seed → same output (best-effort; provider-dependent).
`cache`	`true`	Disable per-call to bypass the AI cache.
`idempotency_key`	random uuid	Same key + same body → cached response within 24 h.
`cite`	`true`	Include `citations[]` in the output (always recommended).

5. Operations

5.1 `ai_summarize`

Generates a structured summary of the scope.

Input

type SummarizeInput = {
  scope: AiScope;
  style?: "neutral" | "bullets" | "executive" | "technical";
  length?: "short" | "medium" | "long";   // ~50 / 200 / 600 tokens target
  language?: string;
  audience?: string;                      // e.g. "incident responder"
  include?: ("tldr" | "key_points" | "open_questions" | "outline")[];
  max_tokens?: number;
  cite?: boolean;
  seed?: number;
  cache?: boolean;
};

Output

type SummarizeOutput = {
  scope_resolved: { paths: string[]; total_tokens: number };
  summary: {
    tldr?: string;                                  // 1–2 sentences
    text: string;                                   // formatted markdown
    key_points?: string[];
    open_questions?: string[];
    outline?: { heading: string; line: number; path: string }[];
  };
  citations: Citation[];
  usage: AiUsage;
  warnings: AiWarning[];
};

type Citation = {
  path: string;
  etag: string;                  // pin the cited revision
  anchor?: string;               // heading or block-id when available
  line?: number;
  span?: [number, number];       // [startLine, endLine]
  snippet?: string;
};

Behavior:

If the scope expands to more notes than will fit in the prompt, the runtime performs map-reduce summarization: per-chunk summary → final summary. This is internal — callers see one structured result. Per-chunk usage is rolled up.
cite: true (default) emits at least one citation per key_point. If the model fails to ground a key point, the runtime drops that point and adds a ungrounded_dropped warning rather than emit unsupported claims.

5.2 `ai_ask`

RAG-style natural-language question answering grounded in the vault.

Input

type AskInput = {
  question: string;
  scope?: AiScope;                       // optional; default = whole vault
  retrieve?: {
    mode?: SearchMode;                   // 'lexical' | 'semantic' | 'hybrid'
    top_k?: number;                      // default 8
    filters?: SearchFilters;
  };
  style?: "neutral" | "bullets" | "executive" | "technical";
  language?: string;
  max_tokens?: number;
  cite?: boolean;                        // default true
  conversation_id?: string;              // for stateless multi-turn (server stores nothing)
  history?: { role: "user" | "assistant"; content: string }[];
  stream?: boolean;                      // SSE/NDJSON token stream; see §7
  seed?: number;
  cache?: boolean;
};

Output

type AskOutput = {
  answer: string;                        // markdown
  confidence: "high" | "medium" | "low" | "unknown";
  citations: Citation[];
  retrieved: {
    path: string; score: number;
    matched_in: ("title"|"body"|"tags"|"frontmatter")[];
  }[];
  followups?: string[];                  // suggested next questions
  usage: AiUsage;
  warnings: AiWarning[];
};

Behavior:

Retrieval defaults to mode: "hybrid" if embeddings are configured, else mode: "lexical" with an embeddings_unavailable warning (mirrors FR-S-4).
The model is instructed to answer only from the retrieved set. If the retrieved evidence does not support an answer, output is confidence: "low" with answer: "I don't know based on the indexed vault." plus unsupported_question warning. The runtime does not synthesize content outside the corpus.
conversation_id is opaque to the server — it is only used as a cache partition key. The server keeps no chat memory; agents pass history.

5.3 `ai_toc`

Two modes share one tool.

Input

type TocInput =
  | { mode: "note"; path: string;
      depth?: 1 | 2 | 3 | 4 | 5 | 6;        // default 3
      include_descriptions?: boolean;        // default false
      include_links?: boolean;               // default true
      style?: "compact" | "expanded";
      max_tokens?: number;
      seed?: number; cache?: boolean }
  | { mode: "moc"; scope: AiScope;          // Map-of-Content for many notes
      group_by?: "tag" | "folder" | "topic" | "frontmatter";
      group_by_key?: string;                 // when "frontmatter"
      max_groups?: number; max_per_group?: number;
      include_summaries?: boolean;           // 1-line per item
      title?: string;
      write?: { path: string;                // optional materialization
                if_not_exists?: boolean;
                if_match?: string;
                frontmatter?: Record<string, unknown> } };

Output

type TocOutput = {
  toc_markdown: string;                       // ready to drop in a note
  toc_tree: TocNode[];                        // structured form
  scope_resolved: { paths: string[]; total_tokens: number };
  written?: { path: string; etag: string };   // present only when write succeeded
  citations: Citation[];
  usage: AiUsage;
  warnings: AiWarning[];
};

type TocNode = {
  title: string;
  path?: string;                              // present in MOC mode
  anchor?: string;                            // heading slug for note mode
  description?: string;
  children?: TocNode[];
};

Behavior:

mode: "note" is deterministic without AI when include_descriptions: false — it walks the note’s outline (already in the index). The chat model is only invoked to generate descriptions. This makes the common case (indx ai toc Spec.md) free of LLM cost.
mode: "moc" clusters notes by group_by and asks the model to title each group + (optionally) annotate each item. Items per group are capped to keep output bounded.
write: { path } materializes the output through the standard core.notes.write pipeline (atomic, ETagged, emits note.created / note.updated). Without write, the op is read-only and requires only vault:read.

5.4 `ai_relate`

Detects related notes and types the relationships.

Input

type RelateInput = {
  source: { kind: "path"; path: string }
        | { kind: "paths"; paths: string[] }
        | { kind: "scope"; scope: AiScope };
  candidates?: AiScope;                       // restrict the candidate pool
  top_k?: number;                             // default 10 per source note
  retrieve_mode?: "semantic" | "lexical" | "hybrid";  // default 'hybrid'
  classify?: boolean;                         // default true if chat available
  relations?: ("extends" | "summarizes" | "cites" | "contradicts"
             | "same_topic" | "precedes" | "depends_on" | "duplicates")[];
                                              // restrict the label set
  threshold?: number;                         // 0..1 confidence cutoff
  propose_links?: boolean;                    // default false
  max_tokens?: number;
  seed?: number; cache?: boolean;
};

Output

type RelateOutput = {
  edges: RelatedEdge[];
  proposed_patches?: ProposedPatch[];        // only when propose_links: true
  scope_resolved: { sources: string[]; candidates_considered: number };
  usage: AiUsage;
  warnings: AiWarning[];
};

type RelatedEdge = {
  src_path: string;
  dst_path: string;
  similarity: number;                        // 0..1, vector similarity
  relation?: RelationLabel;                  // null if classify=false
  confidence?: number;                       // 0..1, present iff relation set
  rationale?: string;                        // 1–2 sentence justification
  evidence: Citation[];
};

type ProposedPatch = {
  path: string;
  ops: PatchOp[];                            // matches SPEC §6.2
  reason: string;
  related_to: string[];
};

Behavior:

Neighbor discovery uses the existing vector + lexical infra. classify: true calls the chat model in batches keyed by src_path so rationale tokens stay deterministic per source.
propose_links: true returns draft patches. The runtime never applies them; the agent inspects and forwards to note_patch. This keeps a human (or supervising agent) in the loop for graph mutations.
Edges are de-duplicated and sorted by (confidence DESC, similarity DESC, dst_path ASC) for determinism.

5.5 `ai_tag`

Suggests tags for one or more notes, biased toward the vault’s existing vocabulary so taxonomies don’t drift. Optionally applies the suggestions through the standard frontmatter write path.

Input

type TagInput = {
  scope: AiScope;                          // notes to tag
  vocabulary?: {
    use_existing?: boolean;                // default true: load tag_list, prefer it
    allow_new?: boolean;                   // default true: model may invent tags
    candidates?: string[];                 // restrict the model to this set
    forbid?: string[];                     // never propose these
    namespace?: string;                    // operate only within #ns/...
    max_per_note?: number;                 // default 5
    min_confidence?: number;               // default 0.6 (drop below)
  };
  apply?: boolean;                         // default false (suggest-only)
  apply_mode?: "merge" | "replace";        // default "merge"
  if_match?: Record<string, string>;       // path → etag, applied as If-Match
  language?: string;
  max_tokens?: number;
  cite?: boolean;                          // default true
  seed?: number;
  cache?: boolean;
  idempotency_key?: string;
};

Output

type TagOutput = {
  suggestions: TagSuggestion[];
  applied?: AppliedTagWrite[];             // present iff apply: true
  proposed_patches?: ProposedPatch[];      // present iff apply: false
  scope_resolved: { paths: string[]; total_tokens: number };
  vocabulary: { existing: string[]; new: string[] };
  citations: Citation[];
  usage: AiUsage;
  warnings: AiWarning[];
};

type TagSuggestion = {
  path: string;
  etag: string;                            // pinned read revision
  tags: {
    tag: string;                           // normalized, no leading '#'
    confidence: number;                    // 0..1
    is_new: boolean;                       // not seen before in this vault
    rationale?: string;
    evidence: Citation[];
  }[];
};

type AppliedTagWrite = {
  path: string;
  etag: string;                            // new etag after the write
  tags_added: string[];
  tags_removed: string[];                  // only non-empty under apply_mode: "replace"
};

Behavior:

vocabulary.use_existing: true (default) loads the result of tag_list into the prompt as the preferred set. Tags below min_confidence are dropped before the model returns; if the user supplies candidates, the output schema is constrained to that exact set (no new tags possible).
Tags are normalized: leading # stripped, lowercased except where the vault’s existing tag is mixed-case (preserve canonical form), validated against the SPEC §4.5 charset. Invalid candidates are dropped with a tag_invalid warning.
apply: true writes via core.notes.patch(actor, { ops: [{ op: "set_frontmatter", key: "tags", value: <merged_or_replaced> }] }) per path. Each path’s write honors its own if_match[path] if supplied.
apply_mode: "merge" (default) dedup-unions with existing tags; "replace" overwrites. Inline #tag mentions in note bodies are never rewritten — body text is the user’s; only frontmatter is managed.
Required scope: vault:read for suggest mode, vault:write for apply: true.

5.6 `ai_metadata`

Generates or refines typed frontmatter fields. Each requested field has a schema; the model is constrained via JSON Schema mode and the runtime re-validates before returning or applying.

Input

type MetadataInput = {
  scope: AiScope;
  fields: MetadataFieldSpec[];             // which keys to fill, with types
  apply?: boolean;                         // default false
  apply_mode?: "set_missing" | "overwrite" | "merge";  // default "set_missing"
  if_match?: Record<string, string>;
  language?: string;
  max_tokens?: number;
  cite?: boolean;                          // default true
  seed?: number;
  cache?: boolean;
  idempotency_key?: string;
};

type MetadataFieldSpec = {
  key: string;                             // frontmatter key (e.g. "title", "due_date", "status")
  type: "string" | "number" | "boolean"
       | "date" | "datetime"
       | "list" | "enum";
  description?: string;                    // hint to the model
  required?: boolean;                      // when true, missing field → metadata_missing warning
  enum?: string[];                         // for type === "enum"
  list_item_type?: MetadataFieldSpec["type"];
  pattern?: string;                        // regex (string only)
  min?: number; max?: number;              // numeric / list length bounds
  default?: unknown;                       // applied when model returns null
};

Output

type MetadataOutput = {
  results: MetadataResult[];
  applied?: AppliedMetadataWrite[];
  proposed_patches?: ProposedPatch[];
  scope_resolved: { paths: string[]; total_tokens: number };
  citations: Citation[];
  usage: AiUsage;
  warnings: AiWarning[];
};

type MetadataResult = {
  path: string;
  etag: string;
  fields: Record<string, {
    value: unknown;                        // schema-validated; null when not extractable
    confidence: number;                    // 0..1
    rationale?: string;
    existing?: unknown;                    // current frontmatter value (for diffs)
    evidence: Citation[];
  }>;
};

type AppliedMetadataWrite = {
  path: string;
  etag: string;
  fields_set: string[];
  fields_skipped: { key: string; reason: "existing_under_set_missing" | "validation_failed" }[];
};

Behavior:

The model is constrained to a json_schema derived from fields[]. After generation the runtime re-validates: anything failing is dropped with a metadata_invalid warning carrying { path, key, reason }.
apply_mode:
- set_missing (default) — only writes keys whose existing value is null/missing; protects user-curated fields.
- overwrite — replaces values regardless of current state.
- merge — for type: "list" only, dedup-unions with the existing value; for scalars, behaves like set_missing.
Type semantics:
- date / datetime outputs are normalized to ISO 8601 (YYYY-MM-DD or YYYY-MM-DDTHH:mm:ssZ). Model output that is unambiguous prose (“yesterday”) fails validation rather than guessing.
- enum is exact-match only; near-misses → metadata_invalid.
- pattern is enforced server-side after generation, not relayed via JSON Schema (provider compatibility is uneven).
Special keys recognized for SPEC §4.1 conventions: writes to tags, aliases, cssclasses use the Obsidian-conformant list shape regardless of the field’s declared type.
Required scope: vault:read for suggest mode, vault:write for apply: true.

5.7 `ai_extract`

Generic extraction into a caller-supplied JSON Schema. Use this when neither “a few typed frontmatter fields” (ai_metadata) nor “tags” (ai_tag) fits — e.g., pulling action items, decisions, contacts, or domain entities out of a note for downstream processing.

Input

type ExtractInput = {
  scope: AiScope;
  schema: JsonSchema;                      // JSON Schema 2020-12 subset
  schema_id?: string;                      // optional human label, included in cache key
  destination?: "json" | "frontmatter";    // default "json" — return only
  destination_key?: string;                // required when destination === "frontmatter"
  apply?: boolean;                         // default false
  apply_mode?: "set_missing" | "overwrite" | "merge";  // for destination: "frontmatter"
  if_match?: Record<string, string>;
  language?: string;
  max_tokens?: number;
  cite?: boolean;                          // default true
  seed?: number;
  cache?: boolean;
  idempotency_key?: string;
};

Output

type ExtractOutput = {
  results: ExtractResult[];
  applied?: AppliedExtractWrite[];
  proposed_patches?: ProposedPatch[];
  scope_resolved: { paths: string[]; total_tokens: number };
  usage: AiUsage;
  warnings: AiWarning[];
};

type ExtractResult = {
  path: string;
  etag: string;
  data: unknown;                           // validated against `schema`
  confidence: number;                      // 0..1
  citations: Citation[];                   // per-field where the model can attribute
};

type AppliedExtractWrite = {
  path: string;
  etag: string;
  key: string;                             // frontmatter key written
};

Behavior:

The runtime constrains generation with schema (JSON Schema mode); after generation it re-validates with the same schema and drops items that fail with extract_invalid.
destination: "json" (default) is read-only — the agent gets data back and decides what to do with it.
destination: "frontmatter" requires destination_key and apply: true to actually write; the data is set as the value of that key through notes.patch. Without apply: true, drafts come back via proposed_patches exactly like ai_relate.
The full schema (or schema_id if supplied) participates in the cache key, so changing the requested shape invalidates cleanly.
Required scope: vault:read for suggest mode, vault:write for apply: true with destination: "frontmatter".

5.8 Apply semantics for multi-note ops (`tag`, `metadata`, `extract`)

ai_tag, ai_metadata, and ai_extract operate on potentially many notes at once. Their apply: true semantics are deliberately conservative:

Per-note atomicity. Each note is written through one core.notes.patch(actor, …) call; that call is itself atomic (SPEC §6.1). A crash between notes leaves the already-written notes consistent and the rest untouched.
Best-effort batch, never partial-mid-note. The runtime applies notes in path-sorted order. A failure on one note does not roll back earlier notes (we cannot atomically transact across files). The response reports each path’s outcome via applied[] and any failures via warnings[] with codes (etag_mismatch, forbidden, validation_failed).
Per-path optimistic concurrency. if_match: { "<path>": "<etag>" } is honored per note; mismatches surface as ai_apply_conflict with details.failed[] and the op stops at the first mismatch unless if_match_mode: "skip" is set (the runtime skips conflicting paths and continues).
Idempotency-by-content. Repeating the same apply request after a successful run is a safe no-op: under apply_mode: "set_missing" (and merge for lists), nothing changes; the response carries apply_skipped_no_change per path. Combine with Idempotency-Key to short-circuit to the cached response within 24 h.
Read-after-write. Each applied[] entry includes the new etag of the note — agents do not need a follow-up note_read.
Audit fan-out. Every per-note write emits a normal note.updated event with the existing Actor (api/cli/mcp); a single roll-up ai.invocation event lists all applied_paths for the call.
Glob gating still wins. If INDX_AI_DENY_GLOBS covers a path, that note is excluded before the chat call and reported via globs_excluded — no provider sees its content, no write happens.

6. Citations and grounding

Every AI output that references vault content carries a citations[] array of Citation records:

path is always vault-relative POSIX.
etag pins the content version the model saw — agents can detect drift by comparing against note_read({ path }).etag.
anchor is a heading or block id when one applies; otherwise omitted.
line / span give exact location for snippet pull.

Grounding rules:

The chat model is constrained to JSON Schema output (json_schema mode) wherever supported. The schema mandates a citations field.
After the model returns, the runtime verifies every cited path exists, the etag matches, and (if anchor or line is given) that the location resolves. Failed citations are dropped with a citation_drift or citation_invalid warning per item.
When cite: true and zero citations survive, the op fails with ai_grounding_failed rather than returning ungrounded prose.

7. Streaming

Long-running ops (notably ai_ask and large ai_summarize) support streaming. Transport-specific:

Surface	Streaming form
HTTP API	`Accept: text/event-stream`. Events: `ai.partial` (token chunk), `ai.citation` (incremental), `ai.usage` (running total), `ai.complete` (final structured payload).
CLI	`--stream` writes NDJSON deltas to stdout, matching the SSE event names; final line is `{"type":"ai.complete", ...}`.
MCP	Tool calls emit progress notifications (`notifications/progress`) with `partial` payloads; the final `tool_result` carries the full `structuredContent`.

Streaming is best-effort. If the provider does not support streaming, the runtime falls back to a single non-streamed response and emits stream_unsupported in warnings (no error).

8. Caching, idempotency, cost

8.1 AI cache

When INDX_AI_CACHE=on (default), every successful AI op result is cached in .indx/ai-cache.db keyed by:

sha256(
  op
  | provider | model | temperature | seed | max_output_tokens | json_schema
  | normalized_input
  | content_fingerprint(scope)
)

normalized_input strips fields that don’t affect output (e.g. idempotency_key, cache, stream).
content_fingerprint(scope) is the sorted list of (path, etag) pairs in the resolved scope. Any vault edit invalidates only the entries that read the changed file.
TTL: 24 h (configurable per call via cache_ttl_seconds).
Cache hits return with usage: { input_tokens: 0, output_tokens: 0, cost_usd: 0 } and warnings: ["ai_cache_hit"].

8.2 Idempotency

The standard Idempotency-Key header / --idempotency-key flag / idempotency_key field works as on every other write — but for AI it also applies to read-shaped ops, so an agent retrying a /v1/ai/ask call after a network blip does not pay for two generations.

Same key + same body within 24 h → cached response with X-Indx-Idempotent-Replay: true. Same key + different body → 409 idempotency_key_reused.

8.3 Cost ceilings

INDX_AI_DAILY_COST_USD clamps total spend per UTC day across all tokens. Over-cap → 429 ai_quota_exceeded with Retry-After: <seconds-until-midnight>.
Per-token rate limits from the existing API limiter still apply.
Each AI call’s usage.cost_usd is a best-effort number derived from the provider’s published rates — emitted on the response, written to events.log as a ai.invocation event, and aggregated for vault_status.ai.

9. Errors

AI ops use the standard error envelope (API §4). Three new codes layer on the existing list:

HTTP	code	When
503	`ai_unavailable`	No suitable provider configured (or `enabled: false` in config) for the requested op. `details.required_role` is `"chat"` or `"embeddings"`.
429	`ai_quota_exceeded`	Daily cost ceiling reached. `details.reset_at` is an ISO 8601 instant.
502	`ai_provider_error`	Upstream call failed (timeout, 5xx, parse). `details.provider`, `details.upstream_status`, `details.upstream_message`.
422	`ai_grounding_failed`	All model citations failed verification under `cite: true`. `details.dropped[]`.
422	`ai_input_too_large`	Resolved scope exceeds `INDX_AI_MAX_INPUT_TOKENS` even after map-reduce. `details.tokens`.
422	`ai_schema_invalid`	The caller-supplied `ai_extract.schema` is not a valid JSON Schema. `details.errors[]`.
409	`ai_apply_conflict`	One or more per-path `if_match` checks failed during `apply: true`; `details.failed[{ path, expected, actual }]`. No partial writes leak — see §5.5/§5.6/§5.7.

Existing codes also apply: validation_failed, forbidden, unauthorized, rate_limited, parse_failed, etag_mismatch (when ai_toc.write collides), etc.

AiWarning (non-fatal) values:

Warning	Meaning
`embeddings_unavailable`	Hybrid/semantic retrieval downgraded to lexical.
`chat_unavailable`	Chat-dependent fields omitted (e.g. `ai_relate` without classification).
`ai_cache_hit`	Result served from cache; `usage` reflects zero cost.
`citation_drift`	A cited path exists but its etag changed since model generation.
`citation_invalid`	A cited path or anchor does not resolve and was dropped.
`ungrounded_dropped`	A model claim without citation was dropped under `cite: true`.
`truncated`	Output hit `max_output_tokens`; `finish_reason: "length"`.
`stream_unsupported`	Streaming requested but provider does not support it.
`globs_excluded`	Some scope candidates were excluded by `INDX_AI_*_GLOBS`.
`unsupported_question`	`ai_ask` could not ground an answer; returned low-confidence stub.
`tag_invalid`	A proposed tag failed SPEC §4.5 charset/shape rules and was dropped.
`tag_below_threshold`	A proposed tag scored under `vocabulary.min_confidence` and was dropped.
`metadata_invalid`	A field value failed schema validation (`pattern` / `enum` / `min/max` / type) and was dropped.
`metadata_missing`	A `required: true` field could not be extracted and was omitted (still an op success).
`extract_invalid`	An extracted record failed schema validation and was dropped.
`apply_skipped_no_change`	`apply: true` was a no-op for a path because nothing changed (e.g. all suggested tags already present under `apply_mode: "merge"`).

10. Surface mapping

AI op            ↔ HTTP                              ↔ CLI                ↔ MCP tool
──────────────────────────────────────────────────────────────────────────────────────
ai_summarize     POST /v1/ai/summarize               indx ai summarize     ai_summarize
ai_ask           POST /v1/ai/ask                     indx ai ask           ai_ask
ai_toc           POST /v1/ai/toc                     indx ai toc           ai_toc
ai_relate        POST /v1/ai/relate                  indx ai relate        ai_relate
ai_tag           POST /v1/ai/tag                     indx ai tag           ai_tag
ai_metadata      POST /v1/ai/metadata                indx ai metadata      ai_metadata
ai_extract       POST /v1/ai/extract                 indx ai extract       ai_extract
ai_status        GET  /v1/ai/status                  indx ai status        ai_status

ai_status is a cheap probe returning { enabled, provider, model, embeddings: { provider, model, dim }, cache: { enabled, hits_24h, size_bytes }, spend_today_usd } — the agent’s discovery handshake for “what AI can I expect from this vault?“

10.1 HTTP details

All AI endpoints are POST (request bodies are non-trivial; cursors and Idempotency-Key belong on writes).
All AI endpoints stream when Accept: text/event-stream; otherwise return a single JSON envelope.
ai_status is GET and behaves like any other lightweight read.

10.2 CLI details

indx ai summarize --scope-paths Spec.md,API.md [--style bullets] [--length short]
indx ai ask "what does the etag scheme guarantee?" [--top-k 8] [--mode hybrid]
indx ai toc --note Architecture.md --depth 3 [--descriptions]
indx ai toc --moc --scope-glob 'Projects/**' --group-by tag --write Index.md
indx ai relate --note Spec.md [--top-k 10] [--classify] [--propose-links]
indx ai status

Global flags (CLI §2.1) apply: --ndjson, --idempotency-key, --if-match, --dry-run. --stream opts into NDJSON streaming (one event per line, terminated by ai.complete).

indx ai toc --moc --write <path> requires --yes if the path exists and will be replaced; --if-not-exists makes it a safe create.

10.3 MCP details

Tools are registered iff a provider is configured and the connecting token has vault:read (and vault:write for the write paths of ai_toc). Tool names are stable; they appear in MCP §3 tables (read tools, except ai_toc with write which is a write tool).

Each AI tool result includes a deterministic content[].text summary, exactly the contract from MCP §3: an agent that wants to display progress without re-rendering the structured payload can quote the text block.

Resources gain two read-only synthetic URIs when AI is enabled:

ai://summary/<scope-spec>     a live summary; refreshes when scope content changes
ai://moc/<scope-spec>         a live MOC for a glob/tag scope

<scope-spec> is a URL-encoded JSON AiScope. Subscribing to either URI (resources/subscribe) re-derives when an underlying note’s etag changes — the runtime triggers a re-summary, and the agent receives notifications/resources/updated.

11. Worked examples

11.1 Summarize a folder, bullets, short

POST /v1/ai/summarize
Authorization: Bearer indx_…
Content-Type: application/json

{
  "scope": { "kind": "glob", "path_glob": "Specs/**/*.md" },
  "style": "bullets",
  "length": "short",
  "include": ["tldr", "key_points", "open_questions"],
  "cite": true
}

Response (abridged):

{
  "ok": true,
  "data": {
    "scope_resolved": { "paths": ["Specs/Auth.md", "Specs/Search.md"], "total_tokens": 4123 },
    "summary": {
      "tldr": "Auth uses static bearer tokens with scopes; search defaults to lexical.",
      "key_points": [
        "Tokens are validated by hashed comparison; secrets never persisted in config.json.",
        "Hybrid search requires an embeddings provider and degrades silently to lexical otherwise."
      ],
      "open_questions": ["Should per-user OIDC ship in v1.1 or v2?"]
    },
    "citations": [
      { "path": "Specs/Auth.md", "etag": "ab12cd34ef567890", "line": 42 },
      { "path": "Specs/Search.md", "etag": "9fbb12…", "anchor": "5.2 Search modes" }
    ],
    "usage": { "input_tokens": 4123, "output_tokens": 218, "cost_usd": 0.0007 },
    "warnings": []
  }
}

11.2 Ask, streaming

indx ai ask "what does the etag scheme guarantee for two concurrent writers?" \
  --top-k 8 --stream

Stdout, one NDJSON event per line:

{"type":"ai.partial","delta":"The ETag is the first 16 hex of"}
{"type":"ai.partial","delta":" xxhash64(bytes-on-disk), so it changes…"}
{"type":"ai.citation","citation":{"path":"SPEC.md","etag":"…","anchor":"6.1 Atomic write"}}
{"type":"ai.usage","usage":{"input_tokens":3211,"output_tokens":104,"cost_usd":0.00041}}
{"type":"ai.complete","data":{ /* full AskOutput */ }}

Exit 0 on a normal complete. Exit 7 on ai_quota_exceeded. Exit 8 on ai_provider_error.

11.3 TOC for a single long note (no LLM cost)

indx ai toc --note Architecture.md --depth 3

include_descriptions: false (default) → outline-only, skips the chat model entirely. usage.cost_usd is 0.

11.4 MOC for “Projects/”, grouped by tag, materialized

indx ai toc --moc \
  --scope-glob 'Projects/**' \
  --group-by tag \
  --write Index/Projects.md \
  --idempotency-key $(uuidgen)

Response includes written: { path: "Index/Projects.md", etag: "…" }. The write goes through the standard atomic pipeline; a subsequent note_read sees the materialized note. Re-running the same command with the same idempotency_key is a no-op replay.

11.5 Relate + propose links

POST /v1/ai/relate
{
  "source": { "kind": "path", "path": "Architecture.md" },
  "candidates": { "kind": "glob", "path_glob": "**/*.md" },
  "top_k": 8, "classify": true, "threshold": 0.55,
  "propose_links": true
}

{
  "ok": true,
  "data": {
    "edges": [
      { "src_path": "Architecture.md", "dst_path": "SPEC.md",
        "similarity": 0.81, "relation": "extends", "confidence": 0.78,
        "rationale": "SPEC.md sets the contracts that Architecture.md implements.",
        "evidence": [{ "path": "SPEC.md", "etag": "…", "anchor": "6.2 Patch operations" }] }
    ],
    "proposed_patches": [
      { "path": "Architecture.md",
        "ops": [{ "op": "insert_after_heading",
                  "heading": "## See also",
                  "markdown": "- [[SPEC]] — patch grammar reference\n" }],
        "reason": "Strong same-topic + extends relation; no existing wikilink found.",
        "related_to": ["SPEC.md"] }
    ],
    "scope_resolved": { "sources": ["Architecture.md"], "candidates_considered": 184 },
    "usage": { "input_tokens": 6420, "output_tokens": 312, "cost_usd": 0.0011 },
    "warnings": []
  }
}

The agent decides whether to apply the proposed patches by calling note_patch — the runtime never auto-edits.

11.6 Tag a recently-edited folder, biased to existing vocabulary

indx ai tag \
  --scope-glob 'Inbox/**/*.md' \
  --use-existing --max-per-note 4 --min-confidence 0.7 \
  --apply --apply-mode merge \
  --idempotency-key $(uuidgen)

{
  "ok": true,
  "data": {
    "suggestions": [
      { "path": "Inbox/2026-05-02-meeting.md", "etag": "ab12…",
        "tags": [
          { "tag": "ops", "confidence": 0.92, "is_new": false,
            "rationale": "incident review under #ops cadence",
            "evidence": [{ "path": "Inbox/2026-05-02-meeting.md", "etag": "ab12…", "anchor": "## Agenda" }] },
          { "tag": "oncall/postmortem", "confidence": 0.81, "is_new": true,
            "rationale": "explicit postmortem section present",
            "evidence": [{ "path": "Inbox/2026-05-02-meeting.md", "etag": "ab12…", "anchor": "## Postmortem" }] }
        ] }
    ],
    "applied": [
      { "path": "Inbox/2026-05-02-meeting.md", "etag": "cd34…",
        "tags_added": ["ops", "oncall/postmortem"], "tags_removed": [] }
    ],
    "vocabulary": { "existing": ["ops"], "new": ["oncall/postmortem"] },
    "citations": [/* … */],
    "usage": { "input_tokens": 1820, "output_tokens": 64, "cost_usd": 0.0002 },
    "warnings": []
  }
}

11.7 Fill missing typed frontmatter fields

POST /v1/ai/metadata
{
  "scope": { "kind": "paths", "paths": ["Projects/Indx-App.md"] },
  "fields": [
    { "key": "title",       "type": "string",  "required": true },
    { "key": "status",      "type": "enum",    "enum": ["idea","active","blocked","done"], "required": true },
    { "key": "due_date",    "type": "date" },
    { "key": "stakeholders","type": "list",    "list_item_type": "string", "max": 5 }
  ],
  "apply": true,
  "apply_mode": "set_missing"
}

{
  "ok": true,
  "data": {
    "results": [{
      "path": "Projects/Indx-App.md", "etag": "ab12…",
      "fields": {
        "title":        { "value": "Indx-App build plan", "confidence": 0.97, "evidence": [/* … */] },
        "status":       { "value": "active",              "confidence": 0.86, "evidence": [/* … */] },
        "due_date":     { "value": "2026-09-01",          "confidence": 0.71, "evidence": [/* … */] },
        "stakeholders": { "value": ["Alice","Bob"],       "confidence": 0.79, "evidence": [/* … */] }
      }
    }],
    "applied": [{
      "path": "Projects/Indx-App.md", "etag": "cd34…",
      "fields_set": ["title","status","due_date","stakeholders"],
      "fields_skipped": []
    }],
    "usage": { "input_tokens": 2410, "output_tokens": 88, "cost_usd": 0.0003 },
    "warnings": []
  }
}

A second run with the same body is a no-op replay (everything already set under set_missing); the response carries apply_skipped_no_change.

11.8 Extract action items into a custom JSON Schema

indx ai extract \
  --scope-glob 'Meetings/2026-Q2/**' \
  --schema-file action-items.schema.json \
  --destination frontmatter \
  --destination-key action_items \
  --apply --apply-mode merge

action-items.schema.json:

{
  "type": "array",
  "items": {
    "type": "object",
    "required": ["task", "owner"],
    "properties": {
      "task":     { "type": "string", "minLength": 3 },
      "owner":    { "type": "string" },
      "due_date": { "type": "string", "format": "date" },
      "blocked_by": { "type": "array", "items": { "type": "string" } }
    },
    "additionalProperties": false
  }
}

Each meeting note ends up with a typed action_items: array in frontmatter — usable directly by Bases queries (SPEC §4.9) without further parsing.

12. Telemetry, audit, and the events stream

Every AI invocation emits a VaultEvent of type ai.invocation (extends SPEC §7):

type AiInvocationEvent = {
  type: "ai.invocation";
  op: "summarize" | "ask" | "toc" | "relate" | "tag" | "metadata" | "extract";
  actor: Actor;                    // ui/api/cli/mcp/fs
  scope_paths: string[];           // resolved (truncated to first 50 for log)
  provider: string;
  model: string;
  usage: AiUsage;
  cache_hit: boolean;
  duration_ms: number;
  ok: boolean;
  error_code?: string;
  applied_paths?: string[];        // present iff apply: true succeeded; truncated to first 50
  at: string;                      // ISO 8601
};

These events flow through:

GET /v1/events (SSE) — filterable with kinds=ai.invocation.
.indx/events.log — the rolling NDJSON audit log.
vault_status.ai — counts, hits/misses, total spend today.

Privacy: prompts and outputs are not written to the event log. Paths, counts, durations, costs, and error codes only — same redaction stance as NFR-PRIV-2.

13. Determinism and reproducibility

AI is non-deterministic by nature; indx narrows the gap:

Defaults: temperature: 0, seed: 0, structured-output JSON Schema mode.
The shape of every AI output is fixed by Zod schemas; freeform prose is contained to specific fields (summary.text, answer, rationale).
Scope resolution and citation verification are deterministic given the vault state — agents that rely on scope_resolved.paths for downstream branching get stable inputs.
The cache (§8.1) makes “same input + same vault state” a guaranteed exact replay across restarts.

This is enough for snapshot-style tests in CI: the runtime ships a AI_TEST_RECORD=1 mode that records cache fixtures, and AI_TEST_REPLAY=1 serves them — no provider call required for unit/integration tests of adapters.

14. Security & privacy considerations

No prompt leakage in logs. Same redaction rules as the rest of the system; only paths, counts, and durations are persisted.
Glob-gated content. INDX_AI_DENY_GLOBS lets the operator keep sensitive paths (Private/**, Journal/**, …) out of any AI op even when an agent specifies them in scope. Filtered candidates surface as globs_excluded warnings — never silently included.
Outbound calls only with consent. AI features tighten, not loosen, NFR-PRIV-1: if neither chat nor embeddings provider is configured, no call is ever made.
Scopes still gate writes. ai_toc with write requires vault:write; ai_relate with propose_links: true returns drafts only — applying requires note_patch, which requires vault:write. There is no AI bypass.
Threat model. A malicious agent with vault:read can exfiltrate vault content via the configured provider. This is the same trust posture as any embeddings call today; document it clearly and let users choose providers (or ai.enabled: false) accordingly. See SPEC §12.

15. Versioning

AI op surface is additive within /v1. New optional fields and warnings are non-breaking; new ops appear in /openapi.json once configured.
Breaking shape changes to existing ops follow the standard API §15 policy: /v2 cut, /v1 retained for one minor.
Provider plumbing is engine-internal — swapping a provider does not change the public surface.

16. What this design buys the agent

One handshake for AI capability discovery — ai_status (or tools/list filtered for ai_*) tells the agent exactly what it can ask for on this vault.
Citations by default — every model claim is verifiable against the vault, with etag-pinned references the agent can re-fetch.
Deterministic shapes — even when the model varies the prose, the structured payload is contractual.
Safe writes — the runtime never auto-edits; AI write paths reuse the atomic pipeline with the same ETag/idempotency invariants.
Cost-aware — usage is reported per call, capped per day, and cached between identical requests so retries are free.
Drop-in compatibility — existing tokens, scopes, configs, and clients keep working; AI is one env var away.

AI Runtime

1. Why a built-in AI runtime

2. Compatibility with the existing configuration

3. Provider model

3.1 Two model roles

3.2 Environment variables (additive only)

3.3 Vault config (.indx/config.json)

3.4 Provider abstraction (engine)

4. Common request shape

5. Operations

5.1 ai_summarize

5.2 ai_ask

5.3 ai_toc

5.4 ai_relate

5.5 ai_tag

5.6 ai_metadata

5.7 ai_extract

5.8 Apply semantics for multi-note ops (tag, metadata, extract)

6. Citations and grounding

7. Streaming

8. Caching, idempotency, cost

8.1 AI cache

8.2 Idempotency

8.3 Cost ceilings

9. Errors

10. Surface mapping

10.1 HTTP details

10.2 CLI details

10.3 MCP details

11. Worked examples

11.1 Summarize a folder, bullets, short

11.2 Ask, streaming

11.3 TOC for a single long note (no LLM cost)

11.4 MOC for “Projects/”, grouped by tag, materialized

11.5 Relate + propose links

11.6 Tag a recently-edited folder, biased to existing vocabulary

11.7 Fill missing typed frontmatter fields

11.8 Extract action items into a custom JSON Schema

12. Telemetry, audit, and the events stream

13. Determinism and reproducibility

14. Security & privacy considerations

15. Versioning

16. What this design buys the agent

Overview

Specification

Reference

Planning

3.3 Vault config (`.indx/config.json`)

5.1 `ai_summarize`

5.2 `ai_ask`

5.3 `ai_toc`

5.4 `ai_relate`

5.5 `ai_tag`

5.6 `ai_metadata`

5.7 `ai_extract`

5.8 Apply semantics for multi-note ops (`tag`, `metadata`, `extract`)