Browse documents

Phase 6 — AI Second Brain

Status: Shipped ✅ · Owner: AI Lead · Duration: 4 weeks · Gate: G6

1. Overview

Phase 6 adds the AI Second Brain — a tenant-scoped intelligence layer that augments the platform with auto-categorisation, anomaly detection, natural-language query, document RAG, and agent-driven workflow automation. The Second Brain is provider-agnostic (OpenAI, Anthropic, Ollama / self-hosted via env config), safety-gated (no side effects without permission), traceable (every AI decision is logged with reasoning), and measurable (quality scored over time).

2. Objectives

  • O6.1 — Provider-agnostic LLM router with at least three drivers (OpenAI, Anthropic, local via Ollama).
  • O6.2 — Auto-categorisation of work orders, alerts, and assets with feedback loop.
  • O6.3 — Natural-language query: "show me assets in B2 with energy consumption above 80kWh today" → MongoDB aggregation + chart suggestion.
  • O6.4 — Document RAG over Atlas-stored docs (SOPs, runbooks, manuals) with citation.
  • O6.5 — Anomaly detection on selected telemetry channels (rolling-window statistical + isolation forest baseline).
  • O6.6 — Agent loop for safe automations (e.g., "if a chiller alarm fires three times in 30 minutes, create a P2 work order, notify the FM lead, attach the last 24h telemetry chart") — opt-in per tenant.
  • O6.7 — AI Chat widget (Phase 5) wired to the Second Brain.
  • O6.8 — Quality scoring: per-task quality dashboard (categorisation precision/recall, RAG hit rate, anomaly false-positive rate).

3. Scope

3.1 In-scope

  • LLM router with provider drivers (OpenAI, Anthropic, Ollama, plus a stub for Azure OpenAI).
  • Model selection per task (env + per-tenant overrides).
  • Embedding pipeline (compute + store in MongoDB Atlas Vector Search).
  • RAG over: documents (uploaded SOPs, runbooks), entity records (assets, work orders), and the platform's own help content.
  • Categorisation service: rule-engine + LLM hybrid, feedback loop, model card.
  • NL query → MongoDB aggregation pipeline (function-calling).
  • Anomaly engine: rolling statistical + isolation forest; tunable per data point.
  • Agent loop with tool registry (read-only tools by default; write tools require explicit permission grant).
  • Safety: prompt injection detection, output schema validation, rate limits, cost budgets per tenant.
  • AI audit: every prompt + completion + tool call + decision logged.
  • AI Chat UI: threaded, multimodal (image upload), citations, follow-up suggestions, tool transparency.
  • Quality metrics dashboard.

3.2 Out-of-scope

  • Training or fine-tuning custom models (post-launch, opt-in service).
  • Image generation (post-launch).
  • Voice interface (post-launch).
  • Multi-tenant model sharing (every tenant operates in its own context; cross-tenant data sharing forbidden).

4. Dependencies

  • Phase 2 (auth, RBAC, audit).
  • Phase 3 (data + telemetry).
  • Phase 4 (entities for context).
  • Phase 5 (AI widgets).

5. Architecture & Design

5.1 LLM router

export interface LlmProvider {
  name: 'openai' | 'anthropic' | 'ollama' | 'azure-openai';
  chat(req: ChatRequest): Promise<ChatResponse>;
  embed(texts: string[]): Promise<number[][]>;
  tools?: ToolSupport;
}

export interface ChatRequest {
  model: string;
  messages: Message[];
  tools?: ToolDef[];        // function-calling style
  responseFormat?: 'text' | { type: 'json_schema'; schema: object };
  maxTokens?: number;
  temperature?: number;
}

Provider selection: per-task default (env), overridable per-tenant via Settings → AI. Cost & rate tracked per tenant.

5.2 Tasks (catalogue)

TaskProvider defaultOutputTool calls?
Categorise work orderAnthropic Haiku-classJSON: {category, subcategory, confidence}No
Categorise alertAnthropic Haiku-classJSONNo
Suggest tag(s)Anthropic Haiku-classJSON arrayNo
Summarise work order history (asset)Anthropic Sonnet-classTextRead tools
NL → Mongo queryAnthropic Sonnet-classJSON pipelineNo
Answer NL question (RAG)Anthropic Sonnet-classText with citationsRead tools
Detect anomaly explanationAnthropic Haiku-classTextRead tools
Agent loop ("if X then Y")Anthropic Sonnet-classPlan + tool callsRead + (opt-in) Write
Chat (free-form)Anthropic Sonnet-classText or JSONRead + (opt-in) Write

5.3 RAG architecture

Document (S3) ─▶ chunker ─▶ embeddings ─▶ Atlas Vector index ┐
Entity records ─▶ summariser ─▶ embeddings ─▶ same index      ├─▶ retriever ─▶ LLM ─▶ answer + citations
Help content   ─▶ chunker ─▶ embeddings ─▶ same index         ┘
  • Chunker: layout-aware (paragraphs, sections); 800-token windows with 100-token overlap.
  • Embedding model: chosen per provider (env-configurable).
  • Vector store: MongoDB Atlas Vector Search (ADR-012); fallback Qdrant if Atlas vector unavailable.
  • Citations: every chunk that contributed is surfaced with link to source.

5.4 Agent loop

  • Plan-then-execute pattern with budget (max 5 tool calls per turn, max 30s wallclock).
  • Tool registry built from OpenAPI: every safe read endpoint exposed as a tool; write endpoints exposed only if tenant + role grants ai.write.<scope>.
  • Critic step before execution: a separate LLM call validates the plan against a policy schema.
  • Safety circuit breakers: any write blocked if it touches OT control (BMS write-back) — requires human-in-the-loop step.

5.5 Quality scoring

  • Categorisation: labelled gold-set, precision / recall / F1 per category, weekly run.
  • RAG: human-labelled answer set, hit-rate + citation correctness.
  • Anomaly: tuned false-positive rate target, alert-fatigue dashboard.
  • All metrics persisted to aiQualityRuns collection; dashboard widget reads them.

5.6 Safety & governance

  • Prompt injection guard: input sanitiser + system prompts allowlist + tool-call validation.
  • Output validation: when responseFormat is JSON schema, reject and retry on mismatch.
  • Cost budgets: per-tenant monthly budget; soft limit warns at 80%, hard at 100%.
  • Rate limits: per-user + per-tenant.
  • Data egress: tenants can mark data as "no-egress" — those queries route only to local Ollama models.
  • Audit: every prompt, response, tool call, citation persisted; redaction for PII.
  • Privacy: APPI / GDPR-respecting prompts; data residency enforced at provider selection.

6. Detailed Specifications

6.1 API surface (Phase 6 additions)

POST   /api/v1/ai/chat                    (streaming; SSE)
POST   /api/v1/ai/categorise              (single or batch)
POST   /api/v1/ai/query                   (NL → results)
POST   /api/v1/ai/explain-anomaly
POST   /api/v1/ai/summarise

# RAG
POST   /api/v1/ai/rag/index/documents     (queue indexing)
GET    /api/v1/ai/rag/index/status
POST   /api/v1/ai/rag/search

# Agent
POST   /api/v1/ai/agent/run               (with plan + tool budget)
GET    /api/v1/ai/agent/runs/:id

# Quality
GET    /api/v1/ai/quality/metrics
POST   /api/v1/ai/feedback                (user thumbs / corrections)

# Settings
GET    /api/v1/ai/settings
PATCH  /api/v1/ai/settings                (per-tenant provider, model, budgets)

6.2 Permissions added

ai.chat.read ai.chat.send
ai.categorise.run ai.categorise.train
ai.query.run
ai.rag.index ai.rag.search
ai.agent.run.read ai.agent.run.write.<scope>
ai.settings.read ai.settings.update
ai.quality.read

6.3 Data model additions

  • aiSettings (per tenant)
  • aiThreads, aiMessages
  • aiArtifacts (chunk + embedding + source ref)
  • aiAgentRuns (plan, tool calls, results, status)
  • aiQualityRuns
  • aiFeedback

6.4 Cost & telemetry

  • Token counts logged per request; cost computed by model price table.
  • Tenant dashboard widget: AI cost this month, by task and provider.
  • Alert when budget threshold hit.

7. Implementation Tasks

Epic 6.A — LLM router

  • 6.A.1 Provider interface + driver implementations (OpenAI, Anthropic, Ollama, Azure OpenAI stub).
  • 6.A.2 Model registry + selection logic.
  • 6.A.3 Streaming support (SSE pass-through).
  • 6.A.4 Cost & token accounting.

Epic 6.B — RAG

  • 6.B.1 Chunker (layout-aware).
  • 6.B.2 Embedding pipeline (queue, retry, dedupe).
  • 6.B.3 Atlas Vector Search index setup + fallback.
  • 6.B.4 Retriever + reranker.
  • 6.B.5 Indexing endpoints + background job.

Epic 6.C — Tasks

  • 6.C.1 Categoriser (rule + LLM hybrid; feedback loop persists corrections; nightly fine-tune of rules).
  • 6.C.2 NL → Mongo aggregation (function-calling pattern; schema-validated; safe sandbox executor).
  • 6.C.3 Summariser.
  • 6.C.4 Anomaly explainer.

Epic 6.D — Agent loop

  • 6.D.1 Tool registry from OpenAPI (read tools auto, write tools by grant).
  • 6.D.2 Planner LLM call.
  • 6.D.3 Critic validation step.
  • 6.D.4 Executor with safety circuit breakers.
  • 6.D.5 Run history + UI.

Epic 6.E — Safety & governance

  • 6.E.1 Prompt-injection guard.
  • 6.E.2 Output schema validation + retry.
  • 6.E.3 Cost budget + alert.
  • 6.E.4 No-egress mode (local models only).
  • 6.E.5 PII redaction in audit.

Epic 6.F — Chat UI

  • 6.F.1 Threaded chat with streaming responses.
  • 6.F.2 Citations, source links.
  • 6.F.3 Tool-transparency UI ("the assistant looked up: …").
  • 6.F.4 Feedback (thumbs / corrections).
  • 6.F.5 Follow-up suggestions.

Epic 6.G — Quality metrics

  • 6.G.1 Gold-set storage.
  • 6.G.2 Nightly evaluation runner.
  • 6.G.3 Quality dashboard widget (Phase 5 widget consuming Phase 6 metrics).

8. Acceptance Criteria

  • AC6.1 — A new tenant can pick its AI provider in Settings and immediately see chat work with the chosen provider.
  • AC6.2 — Bulk categorisation of the KTC 2024 work-order fixture achieves ≥85% category-level precision on the gold-labelled subset.
  • AC6.3 — NL query "show me chillers in Building B with energy consumption above 80kWh today" returns the right assets + a recommended chart widget config.
  • AC6.4 — RAG over an uploaded SOP PDF returns answers with at least one valid citation per answer.
  • AC6.5 — Anomaly detection flags injected synthetic anomalies in a telemetry stream within 5 minutes.
  • AC6.6 — Agent loop performs a multi-step read task (summarise last week's reactive work orders + propose a categorisation rule) within 30 seconds and budget.
  • AC6.7 — A write-tool call to a BMS control point requires explicit human approval — verified by negative test.
  • AC6.8 — Per-tenant cost budget is enforced; exceeding budget triggers warning then block.

9. Test Requirements

  • Unit: ≥80% on router, tools, safety.
  • Integration: each task type tested against a stub provider returning deterministic responses.
  • Eval: nightly quality runs on gold sets (categorisation, RAG, anomaly).
  • Adversarial: prompt-injection test suite; jailbreak attempts.
  • Cost: synthetic load to validate cost accounting accuracy.
  • Performance: chat first-token latency p95 < 2s; p95 full response < 8s.

10. Documentation Requirements

  • docs/ai/overview.md.
  • docs/ai/providers.md (config, env vars).
  • docs/ai/rag.md (indexing pipeline, search).
  • docs/ai/agent.md (tools, safety, opt-in).
  • docs/ai/quality.md (metrics, gold sets).
  • docs/ai/governance.md (privacy, cost, no-egress).
  • ADR-020: LLM router architecture.
  • ADR-021: Embedding model choice.
  • ADR-022: Agent safety policy.

11. Sign-off Criteria (Gate G6)

  • All Acceptance Criteria met.
  • Security review of AI surfaces passed (prompt injection, output validation, tool safety).
  • Privacy review (APPI alignment) passed.
  • AI Lead, Security Lead, Product Owner sign _gates/Gate_G6_signoff.md.
  • Tagged phase-6-v1.0.

12. Risks & Mitigations

RiskLIMitigation
Provider outage33Multi-provider router with failover; local fallback for critical tasks.
Prompt injection causes data leak25Strong sanitiser + output validation + audit; pen-test included.
Hallucinated NL→Mongo queries return wrong data34Function-calling with schema; sandbox executor; require-confirm for mutating queries.
Cost runaway33Budgets + alerts + hard caps + per-task model selection.
Agent acts beyond intent25Critic step + write-permission gating + human-in-the-loop for OT.