Browse documents

Phase 7 — Workflow Creator Studio

Status: Shipped ✅ · Owner: Workflow Lead · Duration: 3 weeks · Gate: G7

1. Overview

Phase 7 delivers the visual Workflow Creator Studio — a drag-and-drop designer for process flows, approvals, triggers, conditions, AI steps, and external integrations. It reuses the state-machine engine introduced in Phase 4 for Work Orders, generalises it, and exposes it through a React-Flow-based canvas. Inspired by ThingsBoard rule chains and n8n-like designers, but tightly integrated with SFMS entities and the AI Second Brain.

2. Objectives

  • O7.1 — Visual designer (React Flow) with palette of node types, validation, save, publish, version, rollback.
  • O7.2 — Generalised state-machine engine: workflows defined as nodes + edges; runtime executes per instance with persisted state.
  • O7.3 — Node library covering triggers (time, event, webhook, telemetry threshold), actions (HTTP, email, notification, create work order, update asset), conditions (expression, branch), approvals (single, sequential, parallel, escalation), AI steps (categorise, summarise, agent), and integration (Atlas SDK calls).
  • O7.4 — Approval flows with mobile-friendly inbox and one-click approve / reject.
  • O7.5 — Audit trail per instance: every node transition, input, output, decision, actor.
  • O7.6 — Templates library (onboarding, work-order escalation, anomaly response, periodic reports).
  • O7.7 — Workflow status visible from any module (e.g., Work Order detail shows the live workflow that owns it).
  • O7.8 — Test / simulate mode (run a workflow against mock data without side effects).

3. Scope

3.1 In-scope

  • Designer canvas: drag nodes, draw edges, validate, lint.
  • Palette grouped by category.
  • Per-node inspector form (Zod schema → React Hook Form).
  • Versioning: drafts, publish, rollback, A/B (small percent traffic).
  • Triggers: cron, event (entity created/updated/deleted, telemetry threshold, alert), webhook (HMAC), manual.
  • Actions: HTTP request, email, notification, create/update entity, run report, run AI task.
  • Conditions: expression (JSONata), switch / branch.
  • Approvals: single user, sequential, parallel (all / any), escalation on timeout.
  • AI steps: categorise, summarise, NL-query, RAG, custom prompt.
  • Loop / iterate over collections (with cap).
  • Parallel + join.
  • Error handlers (try/catch nodes).
  • Persistent state machine; resume on restart.
  • Run history view + replay.
  • Test mode (mock data, side-effect-free).
  • Approval mobile inbox.
  • Workflow widget for dashboards (status, queue length, throughput).

3.2 Out-of-scope

  • Sub-workflows (post-launch).
  • Visual debugger step-through (post-launch; replay covers most cases).
  • Tenant-marketplace workflow templates (post-launch).

4. Dependencies

  • Phase 2 (auth, RBAC, audit).
  • Phase 3 (data + events).
  • Phase 4 (state machine engine, entities, notifications).
  • Phase 6 (AI tasks as nodes).

5. Architecture & Design

5.1 Workflow model

type Workflow = {
  _id: ObjectId; tenantId: string;
  name: string; description?: string;
  status: 'draft' | 'published' | 'archived';
  version: number;
  nodes: WorkflowNode[];
  edges: WorkflowEdge[];
  triggers: WorkflowTrigger[];
  variables?: Record<string, unknown>;
  permissions: { run: string[]; edit: string[] };
  createdAt, updatedAt, createdBy, updatedBy;
};

type WorkflowNode = {
  id: string;
  type: string;                    // 'trigger.cron', 'action.email', 'condition.if', ...
  position: { x: number; y: number };
  data: Record<string, unknown>;   // validated per type schema
};

type WorkflowEdge = { id: string; source: string; target: string; condition?: string };

5.2 Runtime

  • Per workflow run: workflowRuns document with nodeStates map (pending | running | succeeded | failed | skipped).
  • Engine: durable, resumable; uses BullMQ for async node execution and timers; checkpoints on each transition.
  • Concurrency control per workflow (max parallel runs configurable).
  • Per-tenant resource limits to prevent abuse.

5.3 Designer canvas

  • React Flow base.
  • Minimap, zoom, pan.
  • Auto-layout (Dagre or ELK) on demand.
  • Edge labels for branching.
  • Lint: cycles, unreachable nodes, missing required props, type mismatches.

5.4 Approvals

  • An approval node creates an approval record.
  • Recipients receive notification (channel per their prefs).
  • Mobile-friendly inbox at /inbox plus mobile push.
  • Decisions: approve, reject, request changes (sends back), reassign.
  • Escalation timer fires the configured fallback.

5.5 Test mode

  • Same engine but dryRun=true flag.
  • Side-effect actions log "would have done X" instead of executing.
  • Mock data injected at trigger; result shown step by step in the canvas with timing.

6. Detailed Specifications

6.1 API surface (Phase 7 additions)

# Designer
GET    /api/v1/workflows
POST   /api/v1/workflows
GET    /api/v1/workflows/:id
PATCH  /api/v1/workflows/:id
POST   /api/v1/workflows/:id/publish
POST   /api/v1/workflows/:id/archive
GET    /api/v1/workflows/:id/versions
POST   /api/v1/workflows/:id/restore       (from version)

# Runs
POST   /api/v1/workflows/:id/run           (manual trigger; with payload)
POST   /api/v1/workflows/:id/test          (dry-run)
GET    /api/v1/workflow-runs               (filter by workflow, status, date)
GET    /api/v1/workflow-runs/:id
GET    /api/v1/workflow-runs/:id/log
POST   /api/v1/workflow-runs/:id/cancel
POST   /api/v1/workflow-runs/:id/retry

# Approvals
GET    /api/v1/approvals/inbox
POST   /api/v1/approvals/:id/decision
POST   /api/v1/approvals/:id/reassign

# Webhooks (per trigger)
POST   /api/v1/webhooks/workflow/:id       (HMAC)

# Templates
GET    /api/v1/workflow-templates
POST   /api/v1/workflows/from-template/:id

6.2 Permissions added

workflow.read workflow.create workflow.update workflow.publish workflow.archive workflow.delete
workflow.run workflow.run.manual workflow.run.cancel
approval.read approval.decide approval.reassign
webhook.workflow.invoke

6.3 Node catalogue (launch set)

GroupNodeNotes
TriggersCronStandard cron + timezone.
EventSubscribe to entity events.
Telemetry ThresholdWhen DataPoint X crosses Y for N minutes.
WebhookHMAC-signed inbound.
ManualRun-button trigger.
ConditionsIf / ElseJSONata expression.
SwitchMulti-branch.
ActionsHTTP RequestGeneric outbound.
EmailTemplated.
NotificationIn-app / push.
Create Work OrderSFMS.
Update EntitySFMS asset/location/etc.
Generate ReportPhase 4 report runner.
ApprovalsApprovalSingle / sequential / parallel / escalate.
AICategorisePhase 6.
SummarisePhase 6.
RAG AnswerPhase 6.
AgentPhase 6 (with policy).
FlowLoopIterate collection (capped).
ParallelFan-out + join.
Try / CatchError handler.
DelayWait N seconds / until time.

6.4 Templates (launch)

  • Work Order escalation (no-action → reassign → manager → SLA breach alert)
  • Anomaly response (telemetry breach → create P2 work order → notify FM lead)
  • Vendor onboarding (form intake → approval chain → user invite + role grant)
  • Monthly KPI report (cron → run report → email subscribers)
  • Document review (upload → approval chain → publish)

7. Implementation Tasks

Epic 7.A — Designer

  • 7.A.1 Canvas with React Flow.
  • 7.A.2 Node palette + drag onto canvas.
  • 7.A.3 Per-node inspector form.
  • 7.A.4 Linter + visual indicators.
  • 7.A.5 Versioning UI + rollback.

Epic 7.B — Engine

  • 7.B.1 State-machine runtime built on Phase 4's engine.
  • 7.B.2 BullMQ workers for async nodes.
  • 7.B.3 Durable run state.
  • 7.B.4 Concurrency + resource limits.
  • 7.B.5 Replay + retry endpoints.

Epic 7.C — Node library

  • 7.C.1 Trigger nodes (cron, event, telemetry threshold, webhook, manual).
  • 7.C.2 Condition nodes.
  • 7.C.3 Action nodes.
  • 7.C.4 Approval nodes.
  • 7.C.5 AI nodes (Phase 6 integration).
  • 7.C.6 Flow nodes (loop, parallel, try/catch, delay).

Epic 7.D — Approval inbox

  • 7.D.1 Inbox page + filters.
  • 7.D.2 Decision actions; reassign.
  • 7.D.3 Mobile-friendly view.
  • 7.D.4 Push notification integration.

Epic 7.E — Test / simulate

  • 7.E.1 Dry-run mode.
  • 7.E.2 Step-by-step view with timings + variables.
  • 7.E.3 Mock data injector.

Epic 7.F — Templates

  • 7.F.1 Template library + admin CRUD.
  • 7.F.2 Launch templates (§6.4).

8. Acceptance Criteria

  • AC7.1 — A user with workflow.create can drag nodes onto a canvas, configure them, validate, save, and publish.
  • AC7.2 — Publishing a workflow with a cron trigger results in scheduled runs visible in the run history.
  • AC7.3 — A telemetry-threshold trigger fires within 60 seconds of the condition becoming true (using the KTC fixture stream).
  • AC7.4 — An approval node sends notifications; approver can decide from mobile inbox; decision is audited.
  • AC7.5 — Test mode runs an end-to-end workflow against mock data with no side effects; result is shown step by step.
  • AC7.6 — Templates can be instantiated and run; "Work Order escalation" template works against KTC fixture.
  • AC7.7 — Workflow status visible in a dashboard widget (Phase 5) for any tenant-shared dashboard.
  • AC7.8 — Lint catches cycles, missing required props, and type mismatches before publish.

9. Test Requirements

  • Unit: ≥80% on engine + node executors.
  • Integration: every node type with stubbed externals; durable resume after kill; concurrency limits.
  • e2e: design → publish → run → approval → completion.
  • Load: 50 concurrent workflow runs sustained; queue depth bounded.
  • Safety: write-action nodes denied without permission; OT control nodes require human-in-loop.

10. Documentation Requirements

  • docs/workflows/getting_started.md.
  • docs/workflows/node_catalogue.md.
  • docs/workflows/triggers.md.
  • docs/workflows/approvals.md.
  • docs/workflows/expressions.md (JSONata cheat sheet).
  • docs/workflows/templates.md.
  • ADR-023: Workflow engine (in-house vs Temporal final decision, ratifying ADR-008).
  • Updated OpenAPI.

11. Sign-off Criteria (Gate G7)

  • All Acceptance Criteria met.
  • Demo: build a "monthly KTC KPI report" workflow end-to-end live.
  • Workflow Lead, Engineering Lead, Product Owner sign _gates/Gate_G7_signoff.md.
  • Tagged phase-7-v1.0.

12. Risks & Mitigations

RiskLIMitigation
React Flow performance at large workflows23Virtualisation; cap node count per workflow at 200; recommend sub-workflows post-launch.
Engine durability under restart24Checkpoints + idempotent step IDs; chaos test included.
Approvals lost or delayed33Multi-channel delivery; mobile push; escalation timers.
Workflow misconfiguration causes infinite loop34Hard run-time limit + step cap + lint warning on uncapped loops.
Write-action abuse25Permission gates + dry-run encouraged + cost / quota per workflow.