Browse documents

Phase 1 — Requirements & Architecture

Status: Shipped ✅ · Owner: Engineering Lead + Tech Architect · Duration: 2 weeks · Gate: G1

1. Overview

Phase 1 translates the product vision into a buildable specification. The deliverables are a Functional Requirements Specification (FRS), a Non-Functional Requirements Specification (NFRS), a System Architecture Document, an OpenAPI 3.x baseline spec, a MongoDB data-model design, and a set of Architecture Decision Records (ADRs) that lock in the choices that shape every subsequent phase.

No production code is written. Engineers may build spikes (throw-away prototypes) to validate risky decisions, but spikes do not ship.

2. Objectives

  • O1.1 — FRS covers every product capability (Data Platform, SFMS, Dashboards, AI, Workflow, Mobile) with story-level granularity.
  • O1.2 — NFRS quantifies performance, availability, security, accessibility, i18n, observability.
  • O1.3 — System architecture is documented with clear module boundaries and inter-module contracts.
  • O1.4 — OpenAPI 3.x spec exists for the foundational endpoints (auth, tenants, users, health) and the data model surface (asset, location, system, datapoint).
  • O1.5 — MongoDB collections, indexes, and relationships are documented.
  • O1.6 — At least 8 ADRs accepted (multi-tenancy, auth, real-time, ORM, AI provider, workflow engine, observability, deployment).
  • O1.7 — Risk register updated; top 10 risks have mitigations.

3. Scope

3.1 In-scope

  • Workshops with Product Owner + Engineering to capture and refine requirements.
  • Document drafts and reviews.
  • Spikes for: (a) Next.js App Router + MongoDB perf at scale, (b) React Flow workflow designer fit, (c) MongoDB Time Series feasibility for telemetry, (d) Atlas Vector Search for RAG.
  • Threat model (STRIDE).
  • DPIA outline (Phase 9 finalises).

3.2 Out-of-scope

  • Production code.
  • Final security pentest (Phase 9).
  • Detailed UI mockups beyond key-screen wireframes (Phase 5 owns the design system implementation).

4. Dependencies

  • Phase 0 complete (repo, environments, design system seed).

5. Architecture & Design

5.1 High-level system architecture

┌──────────────────────────────────────────────────────────────────────┐
│                          Client (PWA, browser)                       │
│  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐    │
│  │ Dashboard Studio │  │ Workflow Studio  │  │  Mobile Webview  │    │
│  └──────────────────┘  └──────────────────┘  └──────────────────┘    │
└─────────────────────────┬────────────────────────────────────────────┘
                          │ HTTPS · WebSocket · SSE
┌─────────────────────────┴────────────────────────────────────────────┐
│                      Next.js 15 (App Router)                         │
│  ┌─────────────┐  ┌──────────────┐  ┌──────────────┐  ┌───────────┐  │
│  │  UI Layer   │  │ API Routes   │  │ Server Acts. │  │  Realtime │  │
│  └─────────────┘  └──────────────┘  └──────────────┘  └───────────┘  │
│                                                                      │
│  ┌─────────────────────────────────────────────────────────────────┐ │
│  │            Domain Services (server/services/*)                  │ │
│  │   Tenant · Auth · Asset · WorkOrder · Telemetry · AI · Workflow │ │
│  └─────────────────────────────────────────────────────────────────┘ │
│                                                                      │
│  ┌─────────────────────────────────────────────────────────────────┐ │
│  │              Repositories (server/repos/*)  ·  Zod              │ │
│  └─────────────────────────────────────────────────────────────────┘ │
└─────────────┬────────────────┬──────────────┬──────────────┬─────────┘
              │                │              │              │
        ┌─────┴─────┐   ┌──────┴──────┐  ┌────┴────┐  ┌──────┴──────┐
        │  MongoDB  │   │ Redis (BullMQ)│  │  S3     │  │ AI Provider │
        │  Atlas    │   │  Queue / Cache│  │ Objects │  │ (OpenAI /   │
        │ + Vector  │   │               │  │         │  │  Anthropic /│
        │ + TS Coll │   │               │  │         │  │  Ollama)    │
        └───────────┘   └───────────────┘  └─────────┘  └─────────────┘

External:
  · BMS (BACnet/IP)  ─▶ Edge gateway ─▶ MQTTS broker ─▶ Ingest endpoint
  · Third-party SaaS ─▶ Webhook / REST ─▶ Ingest endpoint

5.2 Module catalogue

ModuleOwner phasePurpose
Tenant & IdentityP2Multi-tenant boundary; users, roles, sessions, audit.
Asset ManagementP3, P4Asset / Location / System / DataPoint canonical model.
ConnectorP3BACnet, MQTT, REST, file, CSV ingest.
TelemetryP3Time-series storage and query.
Work OrderP4Planned + reactive work orders, scheduling, history.
DocumentP4File storage, versioning, attachment to entities.
NotificationP4Email, push, in-app, webhook.
DashboardP5Widget framework, layouts, themes, sharing, view modes.
AI Second BrainP6LLM router, RAG, agents, NL query, auto-categorise.
WorkflowP7Visual designer, state machine, approvals, triggers.
Mobile / PWAP8Responsive, offline, push.
Developer (OpenAPI)P2 (skeleton), all phasesAPI spec, Swagger UI / Scalar, SDK.

5.3 Multi-tenancy model

Decision (ADR-003): Pooled MongoDB cluster, tenant-scoped collections with mandatory tenantId discriminator on every document. Enforcement layers:

  1. Repository layer — all queries require a tenantId; helper throws if missing.
  2. API layertenantId is extracted from the authenticated session (or API-key tenant) and injected into the request context.
  3. MongoDB layer — indexes start with tenantId for partition-friendly performance; optional Atlas-side per-tenant database for very large tenants.

5.4 Authentication & Authorisation

  • Auth.js (NextAuth v5) with the following providers: Credentials (email+password+MFA), OIDC (Microsoft, Google, generic), SAML (via WorkOS or equivalent for enterprise SSO).
  • Sessions are JWT, signed, short-lived (15 min) with refresh tokens (rotating, 7 days).
  • MFA mandatory for admin roles; optional for users (tenant can require).
  • API keys issued per-tenant, scoped to permissions, rate-limited, rotatable.
  • RBAC built on roles + permissions tables; permissions are dot-namespaced strings (asset.read, workflow.publish).
  • ABAC overlay: per-record rules (asset.owner == user.id, tenant == ctx.tenant) evaluated by a small policy engine (CASL or hand-rolled).

5.5 Real-time

  • Server-Sent Events (SSE) as default for one-way streams (telemetry, live KPIs).
  • WebSocket (via Next.js custom server or external worker) for two-way (workflow designer collab, dashboard co-edit).
  • All channels are tenant- and permission-scoped.

5.6 OpenAPI baseline

  • Source of truth: packages/schemas (Zod schemas) compiled to OpenAPI via zod-to-openapi or @asteasolutions/zod-to-openapi.
  • Spec served at /api/openapi.json.
  • Swagger UI at /api/docs (default), Scalar at /api/docs/scalar (richer UX).
  • Versioning: API path-prefix /api/v1/*. Major-version bump for breaking changes; deprecation policy 6 months.

5.7 Data model — first cut (refined in Phase 2/3)

Collections (every document carries _id, tenantId, createdAt, updatedAt, createdBy, updatedBy, version):

CollectionKey fields
tenantsname, plan, settings, branding
usersemail, passwordHash, roles, mfa, lastLogin, prefs
rolesname, permissions[], system
apiKeysname, hash, scopes[], expiresAt, lastUsedAt
sessionsuserId, token, expiresAt, ua, ip
auditLogactor, action, target, before, after, meta, at
sitesname, address, geo
buildingssiteId, name, floors[]
locationsbuildingId, floor, code, name, type, geometry
systemsname, description, parentSystemId
assetscode, name, type, classification, locationId, systemId, brand, model, serial, installedAt, attrs{}
dataPointsassetId, tag, function, dataType, connection{}, uom
connectionskind (BACnet/Modbus/MQTT/OPC/API), params{}, gatewayId
gatewaysname, siteId, model, health{}, lastSeenAt
telemetry (Time Series)dataPointId, t, value, quality
workOrderscode, title, assetId, locationId, category, subcategory, status, priority, assignedTo, scheduledAt, closedAt, costs{}
workOrderHistoryworkOrderId, event, actor, at, payload
documentsname, path (S3 key), mime, size, entityRefs[]
dashboardsname, ownerId, shared, layout, widgets[], theme, mode
widgetskind, dataBinding, props, query
workflowsname, version, status, nodes[], edges[], triggers[]
workflowRunsworkflowId, trigger, state, nodeStates{}, startedAt, endedAt
aiThreadsuserId, topic, messages[], context{}
aiArtifactskind, sourceRef, embedding[], metadata{}

5.8 ADRs to produce in Phase 1

#Subject
ADR-003Multi-tenancy model
ADR-004Auth provider (NextAuth)
ADR-005Real-time transport (SSE default, WS where needed)
ADR-006ODM choice (Mongoose vs native driver behind repository)
ADR-007AI provider abstraction layer
ADR-008Workflow engine (in-house state machine vs Temporal/n8n)
ADR-009Observability stack (OpenTelemetry collector + Grafana / commercial)
ADR-010Deployment platform (Vercel / Render / Kubernetes / self-host)
ADR-011Time-series storage (MongoDB Time Series Collections vs separate TSDB)
ADR-012Vector search (MongoDB Atlas Vector Search vs Qdrant/Weaviate)

6. Detailed Specifications

6.1 FRS structure (template per capability)

For each capability the FRS captures:

  • Capability ID (e.g., FR-DASH-001)
  • As-a / I-want / So-that
  • Pre-conditions
  • Main flow
  • Alternative flows / edge cases
  • Permissions required
  • Data touched
  • Telemetry / audit events emitted
  • Acceptance criteria (Gherkin-style)

6.2 NFRS targets (initial)

CategoryTarget
Availability99.9% monthly for prod; 99.5% staging
API p95 latency (CRUD)< 300 ms
Dashboard initial load p95< 3 s
Telemetry ingest throughput5,000 events/sec per tenant baseline; scale-out path proven
Telemetry freshness p95< 5 minutes sensor → dashboard
RPO / RTORPO 15 min, RTO 4 h for critical data
SecurityISO 27001/27017/27018 aligned; OWASP ASVS L2 minimum
PrivacyGDPR / APPI / PDPA compliant; DSAR within 30 days
AccessibilityWCAG 2.2 AA
i18nEnglish + Japanese at launch; framework supports more
Browser supportLatest 2 versions of Chrome, Edge, Safari, Firefox
MobileiOS 16+ Safari, Android 12+ Chrome

6.3 Threat model (STRIDE highlights)

ThreatSurfaceMitigation
Tenant data leakAPI, queriesMandatory tenantId injection; tested at repository layer
Session hijackJWT, cookiesShort-lived JWT + rotating refresh + httpOnly+SameSite cookies
Privilege escalationRBACPermission deny-by-default; ABAC overlay tests
Webhook spoofInbound webhooksHMAC signature verification + replay-window
AI prompt injectionLLMSystem-prompt allowlist, output schema validation, no tool-use without permission
OT write-back misuseWorkflow → actionSafety review gate on any control action; circuit breaker

7. Implementation Tasks

Epic 1.A — Requirements

  • 1.A.1 FRS workshops (Product + Eng + Design); 6 workshops covering Data Platform, SFMS, Dashboards, AI, Workflow, Mobile.
  • 1.A.2 FRS document v1 published.
  • 1.A.3 NFRS document v1 published.
  • 1.A.4 Wireframes for 6 anchor screens (dashboard studio, workflow studio, asset detail, work-order list, mobile inspection, AI chat).

Epic 1.B — Architecture

  • 1.B.1 System Architecture Document with diagrams (C4 model: Context, Container, Component).
  • 1.B.2 Data Model Document with collection schemas, indexes, relationships.
  • 1.B.3 Security Architecture Document with threat model.
  • 1.B.4 ADRs 003–012 drafted and accepted.

Epic 1.C — OpenAPI baseline

  • 1.C.1 packages/schemas package with Zod schemas for User, Tenant, Role, Asset, Location, System, DataPoint.
  • 1.C.2 OpenAPI generator wired (zod-to-openapi).
  • 1.C.3 /api/openapi.json skeleton served.
  • 1.C.4 Swagger UI + Scalar mounted with the skeleton spec.

Epic 1.D — Spikes

  • 1.D.1 Time-series collection benchmark (1M datapoints / 7 days / typical query).
  • 1.D.2 React Flow with 100 nodes, 200 edges — performance verified.
  • 1.D.3 Atlas Vector Search baseline (1k documents, 1k queries/min) latency check.
  • 1.D.4 Auth.js v5 multi-tenant pattern proof.

Epic 1.E — Test strategy

  • 1.E.1 Test Strategy document (unit / integration / e2e / contract / load / security / accessibility / chaos).
  • 1.E.2 Coverage targets per layer.
  • 1.E.3 Tooling matrix (Vitest, Playwright, k6, axe, OWASP ZAP).

8. Acceptance Criteria

  • AC1.1 — FRS, NFRS, System Architecture, Data Model, Security Architecture, OpenAPI baseline, Test Strategy reviewed and signed.
  • AC1.2 — ADRs 003–012 accepted (status: Accepted, reasoning recorded).
  • AC1.3 — Wireframes for the six anchor screens are approved by the Product Owner.
  • AC1.4 — Spike reports filed; each carries a clear "go / no-go / pivot" recommendation.
  • AC1.5 — Risk register updated; top 10 risks have named owners and mitigations.

9. Test Requirements

  • Document reviews (no code).
  • Spike code may carry tests but is not required to be tested or merged.

10. Documentation Requirements

  • FRS (docs/specs/FRS_v1.md)
  • NFRS (docs/specs/NFRS_v1.md)
  • System Architecture (docs/architecture/SA_v1.md + diagrams)
  • Data Model (docs/architecture/data_model_v1.md)
  • Security Architecture + Threat Model (docs/architecture/security_v1.md)
  • ADRs 003–012 (docs/adr/)
  • Test Strategy (docs/quality/test_strategy_v1.md)
  • Wireframes (docs/design/wireframes/)
  • Spike reports (docs/spikes/)

11. Sign-off Criteria (Gate G1)

  • All Acceptance Criteria met.
  • Engineering Lead, Product Owner, Tech Architect, Security Lead sign _gates/Gate_G1_signoff.md.
  • Outputs are version-controlled and tagged phase-1-v1.0.

12. Risks & Mitigations

RiskLIMitigation
Requirements churn after sign-off33Capture late asks as Change Requests; freeze v1 baseline.
Workflow engine choice (in-house vs Temporal) unclear33Spike 1.D.2 informs ADR-008; if Temporal, scope-impact recorded.
Multi-tenant performance unknown at scale24Spike 1.D.1 + load model assumptions in NFRS.
OpenAPI generation tool flakey22Have fallback: manual OpenAPI authoring + Zod kept in sync via tests.