Skip to content

AI Runtime

Built-in AI runtime — summarize, ask, toc, relate, tag, metadata, extract — over CLI, API, MCP.

Status: Draft v0.1 · 2026-05-03 Companion docs: PRD.md, SPEC.md, API.md, CLI.md, MCP.md Audience: AI agents (primary), self-hosters, library authors.

The vault is the truth. AI is a derived capability layered on top. Indx ships a small, opinionated AI runtime so agents can reach for summary, search-grounded answers, table-of-contents creation, and relationship detection without wiring their own retrieval pipeline. Every operation is reachable identically from the CLI, the HTTP API, and MCP.


Two failure modes shaped this design:

  1. Roll-your-own RAG against a vault is brittle. Every new agent reinvents retrieval, snippet extraction, citation formatting, error fallback, and provider plumbing. The vault is right there; indx already indexes it. The runtime should expose ready-made AI ops the same way it exposes note_read — one tool call, one structured result.
  2. AI features that aren’t optional are a tax. A user with an air-gapped self-host or no AI key SHALL still get a fully functional vault. AI ops are opt-in: they are advertised iff a provider is configured, and they degrade explicitly (never silently) when retrieval is the only thing available.

The runtime adds seven operations on top of the existing surfaces:

OpWhat it doesRead-only by default?
ai_summarizeSummarize one note, many notes, or a query/tag scope.yes
ai_askNatural-language question answered from vault content with citations.yes
ai_tocBuild a table of contents (single note) or a Map-of-Content (folder/glob/tag).yes — write is opt-in
ai_relateDetect relationships between notes (related, extends, cites, contradicts, …).yes — propose_links returns drafts only
ai_tagSuggest tags for notes, biased toward the existing vault vocabulary.yes — apply is opt-in
ai_metadataFill or refine typed frontmatter fields (title, aliases, dates, custom keys).yes — apply is opt-in
ai_extractPull structured entities/facts into a caller-supplied JSON Schema, optionally onto frontmatter.yes — apply is opt-in

Every op is grounded — outputs include the paths (and where applicable, block ids or headings) that the model used. No “trust the LLM” without verifiable sources. Every op that can mutate the vault (ai_toc.write, ai_tag.apply, ai_metadata.apply, ai_extract.apply) flows through the standard atomic write pipeline (ETag, idempotency, audit) — there is no AI-only side channel.

2. Compatibility with the existing configuration

Section titled “2. Compatibility with the existing configuration”

This runtime is additive. It does not change any existing surface; it reuses the same conventions that already govern vault_search and the embeddings provider:

  • Same provider plumbing. The chat/completion model uses the same provider vocabulary as the embeddings provider (Vercel AI Gateway, OpenAI-compatible endpoints including Ollama). See SPEC §9.1.
  • Same response envelope. API responses use the existing { ok, data | error } envelope; CLI uses the existing JSON-on-stdout contract; MCP returns structuredContent + a deterministic text summary.
  • Same scope model. vault:read is sufficient for AI reads. AI write paths (e.g., materializing a TOC note) require vault:write — the AI runtime never writes through a side channel.
  • Same error codes. Errors map to existing codes plus three additions: ai_unavailable, ai_quota_exceeded, ai_provider_error. See §9.
  • Same atomic-write pipeline. Materializing AI output as a note goes through core.vault.writeNote like any other write — atomic, ETagged, idempotent. There is no “AI-only” write path.
  • Schema-first. All AI inputs/outputs are Zod schemas in @indx/shared, mirrored to OpenAPI 3.1 and to MCP JSON Schema. See SPEC §14.
  • Zero outbound calls without consent. When no AI provider is configured the runtime serves 503 ai_unavailable rather than ever calling out. See PRD §9, NFR-PRIV-1.

If you have an indx server today, no env var, file, or token must be touched to keep it working: AI tools simply won’t appear in the catalog until a provider is configured.

RoleUsed byRequired for
embeddingsRetrieval (semantic + hybrid)semantic / hybrid search; ai_relate neighbor discovery; ai_summarize of multi-note scopes
chatGeneration (summary text, answers, classifications)ai_summarize, ai_ask, ai_toc, ai_relate (relationship typing)

Either role can be unset. Behavior:

  • No embeddings, no chat → AI runtime is off; tools are not advertised.
  • embeddings only → ai_relate (vector neighbors) is advertised but classification falls back to a deterministic heuristic; ai_summarize / ai_ask / ai_toc return 503 ai_unavailable.
  • chat only → ai_summarize / ai_ask / ai_toc work for explicit-path scopes; query/tag scopes that need semantic recall fall back to lexical.
  • Both → all ops advertised at full fidelity.

The existing embeddings vars are unchanged. The chat-model vars are new and purely additive. If INDX_AI_PROVIDER is unset, indx falls back to INDX_EMBEDDINGS_PROVIDER for the chat role too — most users will set one provider and be done.

VarDefaultNotes
INDX_AI_PROVIDERINDX_EMBEDDINGS_PROVIDER if set, else unsetvercel-ai-gateway | openai | ollama
INDX_AI_MODELunsetProvider-specific id, e.g. anthropic/claude-haiku-4-5
INDX_AI_MAX_INPUT_TOKENS64000Hard cap on prompt context size
INDX_AI_MAX_OUTPUT_TOKENS2048Hard cap on generated output
INDX_AI_TEMPERATURE0Deterministic by default
INDX_AI_DAILY_COST_USDunsetOptional cost ceiling per day; over-cap → ai_quota_exceeded
INDX_AI_ALLOW_GLOBSunsetComma-list of globs eligible for AI ops; default = all
INDX_AI_DENY_GLOBSunsetTakes precedence over allow; matches → exclude
INDX_AI_CACHEonon | off; cache layer in .indx/ai-cache.db
INDX_AI_TIMEOUT_MS30000Per-call upstream timeout

Re-using INDX_AI_GATEWAY_KEY / INDX_OPENAI_BASE_URL / INDX_OPENAI_API_KEY from SPEC §9.1 means the provider credentials carry over from any existing embeddings setup.

{
"ai": {
"enabled": true, // false hides AI tools regardless of env
"provider": "vercel-ai-gateway", // override env if needed
"model": "anthropic/claude-haiku-4-5",
"max_input_tokens": 64000,
"max_output_tokens": 2048,
"temperature": 0,
"allow_globs": ["**/*.md"],
"deny_globs": ["Private/**", "Journal/**"],
"cache": { "enabled": true, "ttl_hours": 24 },
"daily_cost_usd": null
}
}

Vault config wins over env for tuning knobs (temperature, max_*_tokens, allow_globs, deny_globs); env wins for safety knobs (enabled, provider credentials) so a sysadmin can lock the runtime down even if the file says otherwise.

interface ChatProvider {
readonly model: string;
generate(req: {
system: string;
messages: ChatMessage[];
json_schema?: JsonSchema; // structured output mode
max_output_tokens?: number;
temperature?: number;
seed?: number;
signal?: AbortSignal;
}): Promise<{
text: string;
structured?: unknown;
usage: { input_tokens: number; output_tokens: number; cost_usd: number };
finish_reason: "stop" | "length" | "content_filter" | "error";
}>;
}

embeddings provider stays as defined in IMPLEMENTATION §3 src/embeddings/provider.ts. The factory returns null if not configured; AI ops branch on null exactly the way search already does.

Every AI op accepts a scope selecting what content to operate on. Scopes are intersected with INDX_AI_ALLOW_GLOBS / INDX_AI_DENY_GLOBS; paths excluded by globs are silently dropped from the candidate set and reported in warnings.

type AiScope =
| { kind: "paths"; paths: string[] } // explicit list
| { kind: "glob"; path_glob: string } // path glob
| { kind: "tag"; tag: string } // any note tagged
| { kind: "query"; q: string; mode?: SearchMode;
limit?: number; filters?: SearchFilters }
| { kind: "note"; path: string; include_linked?: number }; // hop graph N steps

include_linked: N follows wikilinks N hops out from a single note — useful for “summarize this note and its immediate neighbors”.

Common shared options:

FieldDefaultNotes
languageinferredOutput language (en, ja, …); inferred from the dominant input language.
styleneutralneutral | bullets | executive | technical
max_tokensINDX_AI_MAX_OUTPUT_TOKENSPer-call override (clamped to env max).
seed0Same input + same seed → same output (best-effort; provider-dependent).
cachetrueDisable per-call to bypass the AI cache.
idempotency_keyrandom uuidSame key + same body → cached response within 24 h.
citetrueInclude citations[] in the output (always recommended).

Generates a structured summary of the scope.

Input

type SummarizeInput = {
scope: AiScope;
style?: "neutral" | "bullets" | "executive" | "technical";
length?: "short" | "medium" | "long"; // ~50 / 200 / 600 tokens target
language?: string;
audience?: string; // e.g. "incident responder"
include?: ("tldr" | "key_points" | "open_questions" | "outline")[];
max_tokens?: number;
cite?: boolean;
seed?: number;
cache?: boolean;
};

Output

type SummarizeOutput = {
scope_resolved: { paths: string[]; total_tokens: number };
summary: {
tldr?: string; // 1–2 sentences
text: string; // formatted markdown
key_points?: string[];
open_questions?: string[];
outline?: { heading: string; line: number; path: string }[];
};
citations: Citation[];
usage: AiUsage;
warnings: AiWarning[];
};
type Citation = {
path: string;
etag: string; // pin the cited revision
anchor?: string; // heading or block-id when available
line?: number;
span?: [number, number]; // [startLine, endLine]
snippet?: string;
};

Behavior:

  • If the scope expands to more notes than will fit in the prompt, the runtime performs map-reduce summarization: per-chunk summary → final summary. This is internal — callers see one structured result. Per-chunk usage is rolled up.
  • cite: true (default) emits at least one citation per key_point. If the model fails to ground a key point, the runtime drops that point and adds a ungrounded_dropped warning rather than emit unsupported claims.

RAG-style natural-language question answering grounded in the vault.

Input

type AskInput = {
question: string;
scope?: AiScope; // optional; default = whole vault
retrieve?: {
mode?: SearchMode; // 'lexical' | 'semantic' | 'hybrid'
top_k?: number; // default 8
filters?: SearchFilters;
};
style?: "neutral" | "bullets" | "executive" | "technical";
language?: string;
max_tokens?: number;
cite?: boolean; // default true
conversation_id?: string; // for stateless multi-turn (server stores nothing)
history?: { role: "user" | "assistant"; content: string }[];
stream?: boolean; // SSE/NDJSON token stream; see §7
seed?: number;
cache?: boolean;
};

Output

type AskOutput = {
answer: string; // markdown
confidence: "high" | "medium" | "low" | "unknown";
citations: Citation[];
retrieved: {
path: string; score: number;
matched_in: ("title"|"body"|"tags"|"frontmatter")[];
}[];
followups?: string[]; // suggested next questions
usage: AiUsage;
warnings: AiWarning[];
};

Behavior:

  • Retrieval defaults to mode: "hybrid" if embeddings are configured, else mode: "lexical" with an embeddings_unavailable warning (mirrors FR-S-4).
  • The model is instructed to answer only from the retrieved set. If the retrieved evidence does not support an answer, output is confidence: "low" with answer: "I don't know based on the indexed vault." plus unsupported_question warning. The runtime does not synthesize content outside the corpus.
  • conversation_id is opaque to the server — it is only used as a cache partition key. The server keeps no chat memory; agents pass history.

Two modes share one tool.

Input

type TocInput =
| { mode: "note"; path: string;
depth?: 1 | 2 | 3 | 4 | 5 | 6; // default 3
include_descriptions?: boolean; // default false
include_links?: boolean; // default true
style?: "compact" | "expanded";
max_tokens?: number;
seed?: number; cache?: boolean }
| { mode: "moc"; scope: AiScope; // Map-of-Content for many notes
group_by?: "tag" | "folder" | "topic" | "frontmatter";
group_by_key?: string; // when "frontmatter"
max_groups?: number; max_per_group?: number;
include_summaries?: boolean; // 1-line per item
title?: string;
write?: { path: string; // optional materialization
if_not_exists?: boolean;
if_match?: string;
frontmatter?: Record<string, unknown> } };

Output

type TocOutput = {
toc_markdown: string; // ready to drop in a note
toc_tree: TocNode[]; // structured form
scope_resolved: { paths: string[]; total_tokens: number };
written?: { path: string; etag: string }; // present only when write succeeded
citations: Citation[];
usage: AiUsage;
warnings: AiWarning[];
};
type TocNode = {
title: string;
path?: string; // present in MOC mode
anchor?: string; // heading slug for note mode
description?: string;
children?: TocNode[];
};

Behavior:

  • mode: "note" is deterministic without AI when include_descriptions: false — it walks the note’s outline (already in the index). The chat model is only invoked to generate descriptions. This makes the common case (indx ai toc Spec.md) free of LLM cost.
  • mode: "moc" clusters notes by group_by and asks the model to title each group + (optionally) annotate each item. Items per group are capped to keep output bounded.
  • write: { path } materializes the output through the standard core.notes.write pipeline (atomic, ETagged, emits note.created / note.updated). Without write, the op is read-only and requires only vault:read.

Detects related notes and types the relationships.

Input

type RelateInput = {
source: { kind: "path"; path: string }
| { kind: "paths"; paths: string[] }
| { kind: "scope"; scope: AiScope };
candidates?: AiScope; // restrict the candidate pool
top_k?: number; // default 10 per source note
retrieve_mode?: "semantic" | "lexical" | "hybrid"; // default 'hybrid'
classify?: boolean; // default true if chat available
relations?: ("extends" | "summarizes" | "cites" | "contradicts"
| "same_topic" | "precedes" | "depends_on" | "duplicates")[];
// restrict the label set
threshold?: number; // 0..1 confidence cutoff
propose_links?: boolean; // default false
max_tokens?: number;
seed?: number; cache?: boolean;
};

Output

type RelateOutput = {
edges: RelatedEdge[];
proposed_patches?: ProposedPatch[]; // only when propose_links: true
scope_resolved: { sources: string[]; candidates_considered: number };
usage: AiUsage;
warnings: AiWarning[];
};
type RelatedEdge = {
src_path: string;
dst_path: string;
similarity: number; // 0..1, vector similarity
relation?: RelationLabel; // null if classify=false
confidence?: number; // 0..1, present iff relation set
rationale?: string; // 1–2 sentence justification
evidence: Citation[];
};
type ProposedPatch = {
path: string;
ops: PatchOp[]; // matches SPEC §6.2
reason: string;
related_to: string[];
};

Behavior:

  • Neighbor discovery uses the existing vector + lexical infra. classify: true calls the chat model in batches keyed by src_path so rationale tokens stay deterministic per source.
  • propose_links: true returns draft patches. The runtime never applies them; the agent inspects and forwards to note_patch. This keeps a human (or supervising agent) in the loop for graph mutations.
  • Edges are de-duplicated and sorted by (confidence DESC, similarity DESC, dst_path ASC) for determinism.

Suggests tags for one or more notes, biased toward the vault’s existing vocabulary so taxonomies don’t drift. Optionally applies the suggestions through the standard frontmatter write path.

Input

type TagInput = {
scope: AiScope; // notes to tag
vocabulary?: {
use_existing?: boolean; // default true: load tag_list, prefer it
allow_new?: boolean; // default true: model may invent tags
candidates?: string[]; // restrict the model to this set
forbid?: string[]; // never propose these
namespace?: string; // operate only within #ns/...
max_per_note?: number; // default 5
min_confidence?: number; // default 0.6 (drop below)
};
apply?: boolean; // default false (suggest-only)
apply_mode?: "merge" | "replace"; // default "merge"
if_match?: Record<string, string>; // path → etag, applied as If-Match
language?: string;
max_tokens?: number;
cite?: boolean; // default true
seed?: number;
cache?: boolean;
idempotency_key?: string;
};

Output

type TagOutput = {
suggestions: TagSuggestion[];
applied?: AppliedTagWrite[]; // present iff apply: true
proposed_patches?: ProposedPatch[]; // present iff apply: false
scope_resolved: { paths: string[]; total_tokens: number };
vocabulary: { existing: string[]; new: string[] };
citations: Citation[];
usage: AiUsage;
warnings: AiWarning[];
};
type TagSuggestion = {
path: string;
etag: string; // pinned read revision
tags: {
tag: string; // normalized, no leading '#'
confidence: number; // 0..1
is_new: boolean; // not seen before in this vault
rationale?: string;
evidence: Citation[];
}[];
};
type AppliedTagWrite = {
path: string;
etag: string; // new etag after the write
tags_added: string[];
tags_removed: string[]; // only non-empty under apply_mode: "replace"
};

Behavior:

  • vocabulary.use_existing: true (default) loads the result of tag_list into the prompt as the preferred set. Tags below min_confidence are dropped before the model returns; if the user supplies candidates, the output schema is constrained to that exact set (no new tags possible).
  • Tags are normalized: leading # stripped, lowercased except where the vault’s existing tag is mixed-case (preserve canonical form), validated against the SPEC §4.5 charset. Invalid candidates are dropped with a tag_invalid warning.
  • apply: true writes via core.notes.patch(actor, { ops: [{ op: "set_frontmatter", key: "tags", value: <merged_or_replaced> }] }) per path. Each path’s write honors its own if_match[path] if supplied.
  • apply_mode: "merge" (default) dedup-unions with existing tags; "replace" overwrites. Inline #tag mentions in note bodies are never rewritten — body text is the user’s; only frontmatter is managed.
  • Required scope: vault:read for suggest mode, vault:write for apply: true.

Generates or refines typed frontmatter fields. Each requested field has a schema; the model is constrained via JSON Schema mode and the runtime re-validates before returning or applying.

Input

type MetadataInput = {
scope: AiScope;
fields: MetadataFieldSpec[]; // which keys to fill, with types
apply?: boolean; // default false
apply_mode?: "set_missing" | "overwrite" | "merge"; // default "set_missing"
if_match?: Record<string, string>;
language?: string;
max_tokens?: number;
cite?: boolean; // default true
seed?: number;
cache?: boolean;
idempotency_key?: string;
};
type MetadataFieldSpec = {
key: string; // frontmatter key (e.g. "title", "due_date", "status")
type: "string" | "number" | "boolean"
| "date" | "datetime"
| "list" | "enum";
description?: string; // hint to the model
required?: boolean; // when true, missing field → metadata_missing warning
enum?: string[]; // for type === "enum"
list_item_type?: MetadataFieldSpec["type"];
pattern?: string; // regex (string only)
min?: number; max?: number; // numeric / list length bounds
default?: unknown; // applied when model returns null
};

Output

type MetadataOutput = {
results: MetadataResult[];
applied?: AppliedMetadataWrite[];
proposed_patches?: ProposedPatch[];
scope_resolved: { paths: string[]; total_tokens: number };
citations: Citation[];
usage: AiUsage;
warnings: AiWarning[];
};
type MetadataResult = {
path: string;
etag: string;
fields: Record<string, {
value: unknown; // schema-validated; null when not extractable
confidence: number; // 0..1
rationale?: string;
existing?: unknown; // current frontmatter value (for diffs)
evidence: Citation[];
}>;
};
type AppliedMetadataWrite = {
path: string;
etag: string;
fields_set: string[];
fields_skipped: { key: string; reason: "existing_under_set_missing" | "validation_failed" }[];
};

Behavior:

  • The model is constrained to a json_schema derived from fields[]. After generation the runtime re-validates: anything failing is dropped with a metadata_invalid warning carrying { path, key, reason }.
  • apply_mode:
    • set_missing (default) — only writes keys whose existing value is null/missing; protects user-curated fields.
    • overwrite — replaces values regardless of current state.
    • merge — for type: "list" only, dedup-unions with the existing value; for scalars, behaves like set_missing.
  • Type semantics:
    • date / datetime outputs are normalized to ISO 8601 (YYYY-MM-DD or YYYY-MM-DDTHH:mm:ssZ). Model output that is unambiguous prose (“yesterday”) fails validation rather than guessing.
    • enum is exact-match only; near-misses → metadata_invalid.
    • pattern is enforced server-side after generation, not relayed via JSON Schema (provider compatibility is uneven).
  • Special keys recognized for SPEC §4.1 conventions: writes to tags, aliases, cssclasses use the Obsidian-conformant list shape regardless of the field’s declared type.
  • Required scope: vault:read for suggest mode, vault:write for apply: true.

Generic extraction into a caller-supplied JSON Schema. Use this when neither “a few typed frontmatter fields” (ai_metadata) nor “tags” (ai_tag) fits — e.g., pulling action items, decisions, contacts, or domain entities out of a note for downstream processing.

Input

type ExtractInput = {
scope: AiScope;
schema: JsonSchema; // JSON Schema 2020-12 subset
schema_id?: string; // optional human label, included in cache key
destination?: "json" | "frontmatter"; // default "json" — return only
destination_key?: string; // required when destination === "frontmatter"
apply?: boolean; // default false
apply_mode?: "set_missing" | "overwrite" | "merge"; // for destination: "frontmatter"
if_match?: Record<string, string>;
language?: string;
max_tokens?: number;
cite?: boolean; // default true
seed?: number;
cache?: boolean;
idempotency_key?: string;
};

Output

type ExtractOutput = {
results: ExtractResult[];
applied?: AppliedExtractWrite[];
proposed_patches?: ProposedPatch[];
scope_resolved: { paths: string[]; total_tokens: number };
usage: AiUsage;
warnings: AiWarning[];
};
type ExtractResult = {
path: string;
etag: string;
data: unknown; // validated against `schema`
confidence: number; // 0..1
citations: Citation[]; // per-field where the model can attribute
};
type AppliedExtractWrite = {
path: string;
etag: string;
key: string; // frontmatter key written
};

Behavior:

  • The runtime constrains generation with schema (JSON Schema mode); after generation it re-validates with the same schema and drops items that fail with extract_invalid.
  • destination: "json" (default) is read-only — the agent gets data back and decides what to do with it.
  • destination: "frontmatter" requires destination_key and apply: true to actually write; the data is set as the value of that key through notes.patch. Without apply: true, drafts come back via proposed_patches exactly like ai_relate.
  • The full schema (or schema_id if supplied) participates in the cache key, so changing the requested shape invalidates cleanly.
  • Required scope: vault:read for suggest mode, vault:write for apply: true with destination: "frontmatter".

5.8 Apply semantics for multi-note ops (tag, metadata, extract)

Section titled “5.8 Apply semantics for multi-note ops (tag, metadata, extract)”

ai_tag, ai_metadata, and ai_extract operate on potentially many notes at once. Their apply: true semantics are deliberately conservative:

  • Per-note atomicity. Each note is written through one core.notes.patch(actor, …) call; that call is itself atomic (SPEC §6.1). A crash between notes leaves the already-written notes consistent and the rest untouched.
  • Best-effort batch, never partial-mid-note. The runtime applies notes in path-sorted order. A failure on one note does not roll back earlier notes (we cannot atomically transact across files). The response reports each path’s outcome via applied[] and any failures via warnings[] with codes (etag_mismatch, forbidden, validation_failed).
  • Per-path optimistic concurrency. if_match: { "<path>": "<etag>" } is honored per note; mismatches surface as ai_apply_conflict with details.failed[] and the op stops at the first mismatch unless if_match_mode: "skip" is set (the runtime skips conflicting paths and continues).
  • Idempotency-by-content. Repeating the same apply request after a successful run is a safe no-op: under apply_mode: "set_missing" (and merge for lists), nothing changes; the response carries apply_skipped_no_change per path. Combine with Idempotency-Key to short-circuit to the cached response within 24 h.
  • Read-after-write. Each applied[] entry includes the new etag of the note — agents do not need a follow-up note_read.
  • Audit fan-out. Every per-note write emits a normal note.updated event with the existing Actor (api/cli/mcp); a single roll-up ai.invocation event lists all applied_paths for the call.
  • Glob gating still wins. If INDX_AI_DENY_GLOBS covers a path, that note is excluded before the chat call and reported via globs_excluded — no provider sees its content, no write happens.

Every AI output that references vault content carries a citations[] array of Citation records:

  • path is always vault-relative POSIX.
  • etag pins the content version the model saw — agents can detect drift by comparing against note_read({ path }).etag.
  • anchor is a heading or block id when one applies; otherwise omitted.
  • line / span give exact location for snippet pull.

Grounding rules:

  1. The chat model is constrained to JSON Schema output (json_schema mode) wherever supported. The schema mandates a citations field.
  2. After the model returns, the runtime verifies every cited path exists, the etag matches, and (if anchor or line is given) that the location resolves. Failed citations are dropped with a citation_drift or citation_invalid warning per item.
  3. When cite: true and zero citations survive, the op fails with ai_grounding_failed rather than returning ungrounded prose.

Long-running ops (notably ai_ask and large ai_summarize) support streaming. Transport-specific:

SurfaceStreaming form
HTTP APIAccept: text/event-stream. Events: ai.partial (token chunk), ai.citation (incremental), ai.usage (running total), ai.complete (final structured payload).
CLI--stream writes NDJSON deltas to stdout, matching the SSE event names; final line is {"type":"ai.complete", ...}.
MCPTool calls emit progress notifications (notifications/progress) with partial payloads; the final tool_result carries the full structuredContent.

Streaming is best-effort. If the provider does not support streaming, the runtime falls back to a single non-streamed response and emits stream_unsupported in warnings (no error).

When INDX_AI_CACHE=on (default), every successful AI op result is cached in .indx/ai-cache.db keyed by:

sha256(
op
| provider | model | temperature | seed | max_output_tokens | json_schema
| normalized_input
| content_fingerprint(scope)
)
  • normalized_input strips fields that don’t affect output (e.g. idempotency_key, cache, stream).
  • content_fingerprint(scope) is the sorted list of (path, etag) pairs in the resolved scope. Any vault edit invalidates only the entries that read the changed file.
  • TTL: 24 h (configurable per call via cache_ttl_seconds).
  • Cache hits return with usage: { input_tokens: 0, output_tokens: 0, cost_usd: 0 } and warnings: ["ai_cache_hit"].

The standard Idempotency-Key header / --idempotency-key flag / idempotency_key field works as on every other write — but for AI it also applies to read-shaped ops, so an agent retrying a /v1/ai/ask call after a network blip does not pay for two generations.

Same key + same body within 24 h → cached response with X-Indx-Idempotent-Replay: true. Same key + different body → 409 idempotency_key_reused.

  • INDX_AI_DAILY_COST_USD clamps total spend per UTC day across all tokens. Over-cap → 429 ai_quota_exceeded with Retry-After: <seconds-until-midnight>.
  • Per-token rate limits from the existing API limiter still apply.
  • Each AI call’s usage.cost_usd is a best-effort number derived from the provider’s published rates — emitted on the response, written to events.log as a ai.invocation event, and aggregated for vault_status.ai.

AI ops use the standard error envelope (API §4). Three new codes layer on the existing list:

HTTPcodeWhen
503ai_unavailableNo suitable provider configured (or enabled: false in config) for the requested op. details.required_role is "chat" or "embeddings".
429ai_quota_exceededDaily cost ceiling reached. details.reset_at is an ISO 8601 instant.
502ai_provider_errorUpstream call failed (timeout, 5xx, parse). details.provider, details.upstream_status, details.upstream_message.
422ai_grounding_failedAll model citations failed verification under cite: true. details.dropped[].
422ai_input_too_largeResolved scope exceeds INDX_AI_MAX_INPUT_TOKENS even after map-reduce. details.tokens.
422ai_schema_invalidThe caller-supplied ai_extract.schema is not a valid JSON Schema. details.errors[].
409ai_apply_conflictOne or more per-path if_match checks failed during apply: true; details.failed[{ path, expected, actual }]. No partial writes leak — see §5.5/§5.6/§5.7.

Existing codes also apply: validation_failed, forbidden, unauthorized, rate_limited, parse_failed, etag_mismatch (when ai_toc.write collides), etc.

AiWarning (non-fatal) values:

WarningMeaning
embeddings_unavailableHybrid/semantic retrieval downgraded to lexical.
chat_unavailableChat-dependent fields omitted (e.g. ai_relate without classification).
ai_cache_hitResult served from cache; usage reflects zero cost.
citation_driftA cited path exists but its etag changed since model generation.
citation_invalidA cited path or anchor does not resolve and was dropped.
ungrounded_droppedA model claim without citation was dropped under cite: true.
truncatedOutput hit max_output_tokens; finish_reason: "length".
stream_unsupportedStreaming requested but provider does not support it.
globs_excludedSome scope candidates were excluded by INDX_AI_*_GLOBS.
unsupported_questionai_ask could not ground an answer; returned low-confidence stub.
tag_invalidA proposed tag failed SPEC §4.5 charset/shape rules and was dropped.
tag_below_thresholdA proposed tag scored under vocabulary.min_confidence and was dropped.
metadata_invalidA field value failed schema validation (pattern / enum / min/max / type) and was dropped.
metadata_missingA required: true field could not be extracted and was omitted (still an op success).
extract_invalidAn extracted record failed schema validation and was dropped.
apply_skipped_no_changeapply: true was a no-op for a path because nothing changed (e.g. all suggested tags already present under apply_mode: "merge").
AI op ↔ HTTP ↔ CLI ↔ MCP tool
──────────────────────────────────────────────────────────────────────────────────────
ai_summarize POST /v1/ai/summarize indx ai summarize ai_summarize
ai_ask POST /v1/ai/ask indx ai ask ai_ask
ai_toc POST /v1/ai/toc indx ai toc ai_toc
ai_relate POST /v1/ai/relate indx ai relate ai_relate
ai_tag POST /v1/ai/tag indx ai tag ai_tag
ai_metadata POST /v1/ai/metadata indx ai metadata ai_metadata
ai_extract POST /v1/ai/extract indx ai extract ai_extract
ai_status GET /v1/ai/status indx ai status ai_status

ai_status is a cheap probe returning { enabled, provider, model, embeddings: { provider, model, dim }, cache: { enabled, hits_24h, size_bytes }, spend_today_usd } — the agent’s discovery handshake for “what AI can I expect from this vault?“

  • All AI endpoints are POST (request bodies are non-trivial; cursors and Idempotency-Key belong on writes).
  • All AI endpoints stream when Accept: text/event-stream; otherwise return a single JSON envelope.
  • ai_status is GET and behaves like any other lightweight read.
indx ai summarize --scope-paths Spec.md,API.md [--style bullets] [--length short]
indx ai ask "what does the etag scheme guarantee?" [--top-k 8] [--mode hybrid]
indx ai toc --note Architecture.md --depth 3 [--descriptions]
indx ai toc --moc --scope-glob 'Projects/**' --group-by tag --write Index.md
indx ai relate --note Spec.md [--top-k 10] [--classify] [--propose-links]
indx ai status

Global flags (CLI §2.1) apply: --ndjson, --idempotency-key, --if-match, --dry-run. --stream opts into NDJSON streaming (one event per line, terminated by ai.complete).

indx ai toc --moc --write <path> requires --yes if the path exists and will be replaced; --if-not-exists makes it a safe create.

Tools are registered iff a provider is configured and the connecting token has vault:read (and vault:write for the write paths of ai_toc). Tool names are stable; they appear in MCP §3 tables (read tools, except ai_toc with write which is a write tool).

Each AI tool result includes a deterministic content[].text summary, exactly the contract from MCP §3: an agent that wants to display progress without re-rendering the structured payload can quote the text block.

Resources gain two read-only synthetic URIs when AI is enabled:

ai://summary/<scope-spec> a live summary; refreshes when scope content changes
ai://moc/<scope-spec> a live MOC for a glob/tag scope

<scope-spec> is a URL-encoded JSON AiScope. Subscribing to either URI (resources/subscribe) re-derives when an underlying note’s etag changes — the runtime triggers a re-summary, and the agent receives notifications/resources/updated.

POST /v1/ai/summarize
Authorization: Bearer indx_…
Content-Type: application/json
{
"scope": { "kind": "glob", "path_glob": "Specs/**/*.md" },
"style": "bullets",
"length": "short",
"include": ["tldr", "key_points", "open_questions"],
"cite": true
}

Response (abridged):

{
"ok": true,
"data": {
"scope_resolved": { "paths": ["Specs/Auth.md", "Specs/Search.md"], "total_tokens": 4123 },
"summary": {
"tldr": "Auth uses static bearer tokens with scopes; search defaults to lexical.",
"key_points": [
"Tokens are validated by hashed comparison; secrets never persisted in config.json.",
"Hybrid search requires an embeddings provider and degrades silently to lexical otherwise."
],
"open_questions": ["Should per-user OIDC ship in v1.1 or v2?"]
},
"citations": [
{ "path": "Specs/Auth.md", "etag": "ab12cd34ef567890", "line": 42 },
{ "path": "Specs/Search.md", "etag": "9fbb12…", "anchor": "5.2 Search modes" }
],
"usage": { "input_tokens": 4123, "output_tokens": 218, "cost_usd": 0.0007 },
"warnings": []
}
}
Terminal window
indx ai ask "what does the etag scheme guarantee for two concurrent writers?" \
--top-k 8 --stream

Stdout, one NDJSON event per line:

{"type":"ai.partial","delta":"The ETag is the first 16 hex of"}
{"type":"ai.partial","delta":" xxhash64(bytes-on-disk), so it changes…"}
{"type":"ai.citation","citation":{"path":"SPEC.md","etag":"…","anchor":"6.1 Atomic write"}}
{"type":"ai.usage","usage":{"input_tokens":3211,"output_tokens":104,"cost_usd":0.00041}}
{"type":"ai.complete","data":{ /* full AskOutput */ }}

Exit 0 on a normal complete. Exit 7 on ai_quota_exceeded. Exit 8 on ai_provider_error.

11.3 TOC for a single long note (no LLM cost)

Section titled “11.3 TOC for a single long note (no LLM cost)”
Terminal window
indx ai toc --note Architecture.md --depth 3

include_descriptions: false (default) → outline-only, skips the chat model entirely. usage.cost_usd is 0.

11.4 MOC for “Projects/”, grouped by tag, materialized

Section titled “11.4 MOC for “Projects/”, grouped by tag, materialized”
Terminal window
indx ai toc --moc \
--scope-glob 'Projects/**' \
--group-by tag \
--write Index/Projects.md \
--idempotency-key $(uuidgen)

Response includes written: { path: "Index/Projects.md", etag: "…" }. The write goes through the standard atomic pipeline; a subsequent note_read sees the materialized note. Re-running the same command with the same idempotency_key is a no-op replay.

POST /v1/ai/relate
{
"source": { "kind": "path", "path": "Architecture.md" },
"candidates": { "kind": "glob", "path_glob": "**/*.md" },
"top_k": 8, "classify": true, "threshold": 0.55,
"propose_links": true
}
{
"ok": true,
"data": {
"edges": [
{ "src_path": "Architecture.md", "dst_path": "SPEC.md",
"similarity": 0.81, "relation": "extends", "confidence": 0.78,
"rationale": "SPEC.md sets the contracts that Architecture.md implements.",
"evidence": [{ "path": "SPEC.md", "etag": "…", "anchor": "6.2 Patch operations" }] }
],
"proposed_patches": [
{ "path": "Architecture.md",
"ops": [{ "op": "insert_after_heading",
"heading": "## See also",
"markdown": "- [[SPEC]] — patch grammar reference\n" }],
"reason": "Strong same-topic + extends relation; no existing wikilink found.",
"related_to": ["SPEC.md"] }
],
"scope_resolved": { "sources": ["Architecture.md"], "candidates_considered": 184 },
"usage": { "input_tokens": 6420, "output_tokens": 312, "cost_usd": 0.0011 },
"warnings": []
}
}

The agent decides whether to apply the proposed patches by calling note_patch — the runtime never auto-edits.

11.6 Tag a recently-edited folder, biased to existing vocabulary

Section titled “11.6 Tag a recently-edited folder, biased to existing vocabulary”
Terminal window
indx ai tag \
--scope-glob 'Inbox/**/*.md' \
--use-existing --max-per-note 4 --min-confidence 0.7 \
--apply --apply-mode merge \
--idempotency-key $(uuidgen)
{
"ok": true,
"data": {
"suggestions": [
{ "path": "Inbox/2026-05-02-meeting.md", "etag": "ab12…",
"tags": [
{ "tag": "ops", "confidence": 0.92, "is_new": false,
"rationale": "incident review under #ops cadence",
"evidence": [{ "path": "Inbox/2026-05-02-meeting.md", "etag": "ab12…", "anchor": "## Agenda" }] },
{ "tag": "oncall/postmortem", "confidence": 0.81, "is_new": true,
"rationale": "explicit postmortem section present",
"evidence": [{ "path": "Inbox/2026-05-02-meeting.md", "etag": "ab12…", "anchor": "## Postmortem" }] }
] }
],
"applied": [
{ "path": "Inbox/2026-05-02-meeting.md", "etag": "cd34…",
"tags_added": ["ops", "oncall/postmortem"], "tags_removed": [] }
],
"vocabulary": { "existing": ["ops"], "new": ["oncall/postmortem"] },
"citations": [/* … */],
"usage": { "input_tokens": 1820, "output_tokens": 64, "cost_usd": 0.0002 },
"warnings": []
}
}

11.7 Fill missing typed frontmatter fields

Section titled “11.7 Fill missing typed frontmatter fields”
POST /v1/ai/metadata
{
"scope": { "kind": "paths", "paths": ["Projects/Indx-App.md"] },
"fields": [
{ "key": "title", "type": "string", "required": true },
{ "key": "status", "type": "enum", "enum": ["idea","active","blocked","done"], "required": true },
{ "key": "due_date", "type": "date" },
{ "key": "stakeholders","type": "list", "list_item_type": "string", "max": 5 }
],
"apply": true,
"apply_mode": "set_missing"
}
{
"ok": true,
"data": {
"results": [{
"path": "Projects/Indx-App.md", "etag": "ab12…",
"fields": {
"title": { "value": "Indx-App build plan", "confidence": 0.97, "evidence": [/* … */] },
"status": { "value": "active", "confidence": 0.86, "evidence": [/* … */] },
"due_date": { "value": "2026-09-01", "confidence": 0.71, "evidence": [/* … */] },
"stakeholders": { "value": ["Alice","Bob"], "confidence": 0.79, "evidence": [/* … */] }
}
}],
"applied": [{
"path": "Projects/Indx-App.md", "etag": "cd34…",
"fields_set": ["title","status","due_date","stakeholders"],
"fields_skipped": []
}],
"usage": { "input_tokens": 2410, "output_tokens": 88, "cost_usd": 0.0003 },
"warnings": []
}
}

A second run with the same body is a no-op replay (everything already set under set_missing); the response carries apply_skipped_no_change.

11.8 Extract action items into a custom JSON Schema

Section titled “11.8 Extract action items into a custom JSON Schema”
Terminal window
indx ai extract \
--scope-glob 'Meetings/2026-Q2/**' \
--schema-file action-items.schema.json \
--destination frontmatter \
--destination-key action_items \
--apply --apply-mode merge

action-items.schema.json:

{
"type": "array",
"items": {
"type": "object",
"required": ["task", "owner"],
"properties": {
"task": { "type": "string", "minLength": 3 },
"owner": { "type": "string" },
"due_date": { "type": "string", "format": "date" },
"blocked_by": { "type": "array", "items": { "type": "string" } }
},
"additionalProperties": false
}
}

Each meeting note ends up with a typed action_items: array in frontmatter — usable directly by Bases queries (SPEC §4.9) without further parsing.

12. Telemetry, audit, and the events stream

Section titled “12. Telemetry, audit, and the events stream”

Every AI invocation emits a VaultEvent of type ai.invocation (extends SPEC §7):

type AiInvocationEvent = {
type: "ai.invocation";
op: "summarize" | "ask" | "toc" | "relate" | "tag" | "metadata" | "extract";
actor: Actor; // ui/api/cli/mcp/fs
scope_paths: string[]; // resolved (truncated to first 50 for log)
provider: string;
model: string;
usage: AiUsage;
cache_hit: boolean;
duration_ms: number;
ok: boolean;
error_code?: string;
applied_paths?: string[]; // present iff apply: true succeeded; truncated to first 50
at: string; // ISO 8601
};

These events flow through:

  • GET /v1/events (SSE) — filterable with kinds=ai.invocation.
  • .indx/events.log — the rolling NDJSON audit log.
  • vault_status.ai — counts, hits/misses, total spend today.

Privacy: prompts and outputs are not written to the event log. Paths, counts, durations, costs, and error codes only — same redaction stance as NFR-PRIV-2.

AI is non-deterministic by nature; indx narrows the gap:

  • Defaults: temperature: 0, seed: 0, structured-output JSON Schema mode.
  • The shape of every AI output is fixed by Zod schemas; freeform prose is contained to specific fields (summary.text, answer, rationale).
  • Scope resolution and citation verification are deterministic given the vault state — agents that rely on scope_resolved.paths for downstream branching get stable inputs.
  • The cache (§8.1) makes “same input + same vault state” a guaranteed exact replay across restarts.

This is enough for snapshot-style tests in CI: the runtime ships a AI_TEST_RECORD=1 mode that records cache fixtures, and AI_TEST_REPLAY=1 serves them — no provider call required for unit/integration tests of adapters.

  • No prompt leakage in logs. Same redaction rules as the rest of the system; only paths, counts, and durations are persisted.
  • Glob-gated content. INDX_AI_DENY_GLOBS lets the operator keep sensitive paths (Private/**, Journal/**, …) out of any AI op even when an agent specifies them in scope. Filtered candidates surface as globs_excluded warnings — never silently included.
  • Outbound calls only with consent. AI features tighten, not loosen, NFR-PRIV-1: if neither chat nor embeddings provider is configured, no call is ever made.
  • Scopes still gate writes. ai_toc with write requires vault:write; ai_relate with propose_links: true returns drafts only — applying requires note_patch, which requires vault:write. There is no AI bypass.
  • Threat model. A malicious agent with vault:read can exfiltrate vault content via the configured provider. This is the same trust posture as any embeddings call today; document it clearly and let users choose providers (or ai.enabled: false) accordingly. See SPEC §12.
  • AI op surface is additive within /v1. New optional fields and warnings are non-breaking; new ops appear in /openapi.json once configured.
  • Breaking shape changes to existing ops follow the standard API §15 policy: /v2 cut, /v1 retained for one minor.
  • Provider plumbing is engine-internal — swapping a provider does not change the public surface.
  • One handshake for AI capability discoveryai_status (or tools/list filtered for ai_*) tells the agent exactly what it can ask for on this vault.
  • Citations by default — every model claim is verifiable against the vault, with etag-pinned references the agent can re-fetch.
  • Deterministic shapes — even when the model varies the prose, the structured payload is contractual.
  • Safe writes — the runtime never auto-edits; AI write paths reuse the atomic pipeline with the same ETag/idempotency invariants.
  • Cost-aware — usage is reported per call, capped per day, and cached between identical requests so retries are free.
  • Drop-in compatibility — existing tokens, scopes, configs, and clients keep working; AI is one env var away.