Phase 1 — Requirements & Architecture
Status: Shipped ✅ · Owner: Engineering Lead + Tech Architect · Duration: 2 weeks · Gate: G1
1. Overview
Phase 1 translates the product vision into a buildable specification. The deliverables are a Functional Requirements Specification (FRS), a Non-Functional Requirements Specification (NFRS), a System Architecture Document, an OpenAPI 3.x baseline spec, a MongoDB data-model design, and a set of Architecture Decision Records (ADRs) that lock in the choices that shape every subsequent phase.
No production code is written. Engineers may build spikes (throw-away prototypes) to validate risky decisions, but spikes do not ship.
2. Objectives
- O1.1 — FRS covers every product capability (Data Platform, SFMS, Dashboards, AI, Workflow, Mobile) with story-level granularity.
- O1.2 — NFRS quantifies performance, availability, security, accessibility, i18n, observability.
- O1.3 — System architecture is documented with clear module boundaries and inter-module contracts.
- O1.4 — OpenAPI 3.x spec exists for the foundational endpoints (auth, tenants, users, health) and the data model surface (asset, location, system, datapoint).
- O1.5 — MongoDB collections, indexes, and relationships are documented.
- O1.6 — At least 8 ADRs accepted (multi-tenancy, auth, real-time, ORM, AI provider, workflow engine, observability, deployment).
- O1.7 — Risk register updated; top 10 risks have mitigations.
3. Scope
3.1 In-scope
- Workshops with Product Owner + Engineering to capture and refine requirements.
- Document drafts and reviews.
- Spikes for: (a) Next.js App Router + MongoDB perf at scale, (b) React Flow workflow designer fit, (c) MongoDB Time Series feasibility for telemetry, (d) Atlas Vector Search for RAG.
- Threat model (STRIDE).
- DPIA outline (Phase 9 finalises).
3.2 Out-of-scope
- Production code.
- Final security pentest (Phase 9).
- Detailed UI mockups beyond key-screen wireframes (Phase 5 owns the design system implementation).
4. Dependencies
- Phase 0 complete (repo, environments, design system seed).
5. Architecture & Design
5.1 High-level system architecture
┌──────────────────────────────────────────────────────────────────────┐
│ Client (PWA, browser) │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Dashboard Studio │ │ Workflow Studio │ │ Mobile Webview │ │
│ └──────────────────┘ └──────────────────┘ └──────────────────┘ │
└─────────────────────────┬────────────────────────────────────────────┘
│ HTTPS · WebSocket · SSE
┌─────────────────────────┴────────────────────────────────────────────┐
│ Next.js 15 (App Router) │
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │
│ │ UI Layer │ │ API Routes │ │ Server Acts. │ │ Realtime │ │
│ └─────────────┘ └──────────────┘ └──────────────┘ └───────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Domain Services (server/services/*) │ │
│ │ Tenant · Auth · Asset · WorkOrder · Telemetry · AI · Workflow │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Repositories (server/repos/*) · Zod │ │
│ └─────────────────────────────────────────────────────────────────┘ │
└─────────────┬────────────────┬──────────────┬──────────────┬─────────┘
│ │ │ │
┌─────┴─────┐ ┌──────┴──────┐ ┌────┴────┐ ┌──────┴──────┐
│ MongoDB │ │ Redis (BullMQ)│ │ S3 │ │ AI Provider │
│ Atlas │ │ Queue / Cache│ │ Objects │ │ (OpenAI / │
│ + Vector │ │ │ │ │ │ Anthropic /│
│ + TS Coll │ │ │ │ │ │ Ollama) │
└───────────┘ └───────────────┘ └─────────┘ └─────────────┘
External:
· BMS (BACnet/IP) ─▶ Edge gateway ─▶ MQTTS broker ─▶ Ingest endpoint
· Third-party SaaS ─▶ Webhook / REST ─▶ Ingest endpoint
5.2 Module catalogue
| Module | Owner phase | Purpose |
|---|---|---|
| Tenant & Identity | P2 | Multi-tenant boundary; users, roles, sessions, audit. |
| Asset Management | P3, P4 | Asset / Location / System / DataPoint canonical model. |
| Connector | P3 | BACnet, MQTT, REST, file, CSV ingest. |
| Telemetry | P3 | Time-series storage and query. |
| Work Order | P4 | Planned + reactive work orders, scheduling, history. |
| Document | P4 | File storage, versioning, attachment to entities. |
| Notification | P4 | Email, push, in-app, webhook. |
| Dashboard | P5 | Widget framework, layouts, themes, sharing, view modes. |
| AI Second Brain | P6 | LLM router, RAG, agents, NL query, auto-categorise. |
| Workflow | P7 | Visual designer, state machine, approvals, triggers. |
| Mobile / PWA | P8 | Responsive, offline, push. |
| Developer (OpenAPI) | P2 (skeleton), all phases | API spec, Swagger UI / Scalar, SDK. |
5.3 Multi-tenancy model
Decision (ADR-003): Pooled MongoDB cluster, tenant-scoped collections with mandatory tenantId discriminator on every document. Enforcement layers:
- Repository layer — all queries require a
tenantId; helper throws if missing. - API layer —
tenantIdis extracted from the authenticated session (or API-key tenant) and injected into the request context. - MongoDB layer — indexes start with
tenantIdfor partition-friendly performance; optional Atlas-side per-tenant database for very large tenants.
5.4 Authentication & Authorisation
- Auth.js (NextAuth v5) with the following providers: Credentials (email+password+MFA), OIDC (Microsoft, Google, generic), SAML (via WorkOS or equivalent for enterprise SSO).
- Sessions are JWT, signed, short-lived (15 min) with refresh tokens (rotating, 7 days).
- MFA mandatory for admin roles; optional for users (tenant can require).
- API keys issued per-tenant, scoped to permissions, rate-limited, rotatable.
- RBAC built on roles + permissions tables; permissions are dot-namespaced strings (
asset.read,workflow.publish). - ABAC overlay: per-record rules (
asset.owner == user.id,tenant == ctx.tenant) evaluated by a small policy engine (CASL or hand-rolled).
5.5 Real-time
- Server-Sent Events (SSE) as default for one-way streams (telemetry, live KPIs).
- WebSocket (via Next.js custom server or external worker) for two-way (workflow designer collab, dashboard co-edit).
- All channels are tenant- and permission-scoped.
5.6 OpenAPI baseline
- Source of truth:
packages/schemas(Zod schemas) compiled to OpenAPI viazod-to-openapior@asteasolutions/zod-to-openapi. - Spec served at
/api/openapi.json. - Swagger UI at
/api/docs(default), Scalar at/api/docs/scalar(richer UX). - Versioning: API path-prefix
/api/v1/*. Major-version bump for breaking changes; deprecation policy 6 months.
5.7 Data model — first cut (refined in Phase 2/3)
Collections (every document carries _id, tenantId, createdAt, updatedAt, createdBy, updatedBy, version):
| Collection | Key fields |
|---|---|
tenants | name, plan, settings, branding |
users | email, passwordHash, roles, mfa, lastLogin, prefs |
roles | name, permissions[], system |
apiKeys | name, hash, scopes[], expiresAt, lastUsedAt |
sessions | userId, token, expiresAt, ua, ip |
auditLog | actor, action, target, before, after, meta, at |
sites | name, address, geo |
buildings | siteId, name, floors[] |
locations | buildingId, floor, code, name, type, geometry |
systems | name, description, parentSystemId |
assets | code, name, type, classification, locationId, systemId, brand, model, serial, installedAt, attrs{} |
dataPoints | assetId, tag, function, dataType, connection{}, uom |
connections | kind (BACnet/Modbus/MQTT/OPC/API), params{}, gatewayId |
gateways | name, siteId, model, health{}, lastSeenAt |
telemetry (Time Series) | dataPointId, t, value, quality |
workOrders | code, title, assetId, locationId, category, subcategory, status, priority, assignedTo, scheduledAt, closedAt, costs{} |
workOrderHistory | workOrderId, event, actor, at, payload |
documents | name, path (S3 key), mime, size, entityRefs[] |
dashboards | name, ownerId, shared, layout, widgets[], theme, mode |
widgets | kind, dataBinding, props, query |
workflows | name, version, status, nodes[], edges[], triggers[] |
workflowRuns | workflowId, trigger, state, nodeStates{}, startedAt, endedAt |
aiThreads | userId, topic, messages[], context{} |
aiArtifacts | kind, sourceRef, embedding[], metadata{} |
5.8 ADRs to produce in Phase 1
| # | Subject |
|---|---|
| ADR-003 | Multi-tenancy model |
| ADR-004 | Auth provider (NextAuth) |
| ADR-005 | Real-time transport (SSE default, WS where needed) |
| ADR-006 | ODM choice (Mongoose vs native driver behind repository) |
| ADR-007 | AI provider abstraction layer |
| ADR-008 | Workflow engine (in-house state machine vs Temporal/n8n) |
| ADR-009 | Observability stack (OpenTelemetry collector + Grafana / commercial) |
| ADR-010 | Deployment platform (Vercel / Render / Kubernetes / self-host) |
| ADR-011 | Time-series storage (MongoDB Time Series Collections vs separate TSDB) |
| ADR-012 | Vector search (MongoDB Atlas Vector Search vs Qdrant/Weaviate) |
6. Detailed Specifications
6.1 FRS structure (template per capability)
For each capability the FRS captures:
- Capability ID (e.g.,
FR-DASH-001) - As-a / I-want / So-that
- Pre-conditions
- Main flow
- Alternative flows / edge cases
- Permissions required
- Data touched
- Telemetry / audit events emitted
- Acceptance criteria (Gherkin-style)
6.2 NFRS targets (initial)
| Category | Target |
|---|---|
| Availability | 99.9% monthly for prod; 99.5% staging |
| API p95 latency (CRUD) | < 300 ms |
| Dashboard initial load p95 | < 3 s |
| Telemetry ingest throughput | 5,000 events/sec per tenant baseline; scale-out path proven |
| Telemetry freshness p95 | < 5 minutes sensor → dashboard |
| RPO / RTO | RPO 15 min, RTO 4 h for critical data |
| Security | ISO 27001/27017/27018 aligned; OWASP ASVS L2 minimum |
| Privacy | GDPR / APPI / PDPA compliant; DSAR within 30 days |
| Accessibility | WCAG 2.2 AA |
| i18n | English + Japanese at launch; framework supports more |
| Browser support | Latest 2 versions of Chrome, Edge, Safari, Firefox |
| Mobile | iOS 16+ Safari, Android 12+ Chrome |
6.3 Threat model (STRIDE highlights)
| Threat | Surface | Mitigation |
|---|---|---|
| Tenant data leak | API, queries | Mandatory tenantId injection; tested at repository layer |
| Session hijack | JWT, cookies | Short-lived JWT + rotating refresh + httpOnly+SameSite cookies |
| Privilege escalation | RBAC | Permission deny-by-default; ABAC overlay tests |
| Webhook spoof | Inbound webhooks | HMAC signature verification + replay-window |
| AI prompt injection | LLM | System-prompt allowlist, output schema validation, no tool-use without permission |
| OT write-back misuse | Workflow → action | Safety review gate on any control action; circuit breaker |
7. Implementation Tasks
Epic 1.A — Requirements
- 1.A.1 FRS workshops (Product + Eng + Design); 6 workshops covering Data Platform, SFMS, Dashboards, AI, Workflow, Mobile.
- 1.A.2 FRS document v1 published.
- 1.A.3 NFRS document v1 published.
- 1.A.4 Wireframes for 6 anchor screens (dashboard studio, workflow studio, asset detail, work-order list, mobile inspection, AI chat).
Epic 1.B — Architecture
- 1.B.1 System Architecture Document with diagrams (C4 model: Context, Container, Component).
- 1.B.2 Data Model Document with collection schemas, indexes, relationships.
- 1.B.3 Security Architecture Document with threat model.
- 1.B.4 ADRs 003–012 drafted and accepted.
Epic 1.C — OpenAPI baseline
- 1.C.1
packages/schemaspackage with Zod schemas forUser,Tenant,Role,Asset,Location,System,DataPoint. - 1.C.2 OpenAPI generator wired (zod-to-openapi).
- 1.C.3
/api/openapi.jsonskeleton served. - 1.C.4 Swagger UI + Scalar mounted with the skeleton spec.
Epic 1.D — Spikes
- 1.D.1 Time-series collection benchmark (1M datapoints / 7 days / typical query).
- 1.D.2 React Flow with 100 nodes, 200 edges — performance verified.
- 1.D.3 Atlas Vector Search baseline (1k documents, 1k queries/min) latency check.
- 1.D.4 Auth.js v5 multi-tenant pattern proof.
Epic 1.E — Test strategy
- 1.E.1 Test Strategy document (unit / integration / e2e / contract / load / security / accessibility / chaos).
- 1.E.2 Coverage targets per layer.
- 1.E.3 Tooling matrix (Vitest, Playwright, k6, axe, OWASP ZAP).
8. Acceptance Criteria
- AC1.1 — FRS, NFRS, System Architecture, Data Model, Security Architecture, OpenAPI baseline, Test Strategy reviewed and signed.
- AC1.2 — ADRs 003–012 accepted (status: Accepted, reasoning recorded).
- AC1.3 — Wireframes for the six anchor screens are approved by the Product Owner.
- AC1.4 — Spike reports filed; each carries a clear "go / no-go / pivot" recommendation.
- AC1.5 — Risk register updated; top 10 risks have named owners and mitigations.
9. Test Requirements
- Document reviews (no code).
- Spike code may carry tests but is not required to be tested or merged.
10. Documentation Requirements
- FRS (
docs/specs/FRS_v1.md) - NFRS (
docs/specs/NFRS_v1.md) - System Architecture (
docs/architecture/SA_v1.md+ diagrams) - Data Model (
docs/architecture/data_model_v1.md) - Security Architecture + Threat Model (
docs/architecture/security_v1.md) - ADRs 003–012 (
docs/adr/) - Test Strategy (
docs/quality/test_strategy_v1.md) - Wireframes (
docs/design/wireframes/) - Spike reports (
docs/spikes/)
11. Sign-off Criteria (Gate G1)
- All Acceptance Criteria met.
- Engineering Lead, Product Owner, Tech Architect, Security Lead sign
_gates/Gate_G1_signoff.md. - Outputs are version-controlled and tagged
phase-1-v1.0.
12. Risks & Mitigations
| Risk | L | I | Mitigation |
|---|---|---|---|
| Requirements churn after sign-off | 3 | 3 | Capture late asks as Change Requests; freeze v1 baseline. |
| Workflow engine choice (in-house vs Temporal) unclear | 3 | 3 | Spike 1.D.2 informs ADR-008; if Temporal, scope-impact recorded. |
| Multi-tenant performance unknown at scale | 2 | 4 | Spike 1.D.1 + load model assumptions in NFRS. |
| OpenAPI generation tool flakey | 2 | 2 | Have fallback: manual OpenAPI authoring + Zod kept in sync via tests. |