Browse documents

Phase 2 — Core Platform

Status: Shipped ✅ · Owner: Backend Lead · Duration: 3 weeks · Gate: G2

1. Overview

Phase 2 implements the shared platform every other module sits on: authentication, RBAC, multi-tenant data layer, MongoDB models, REST + Server Action conventions, OpenAPI publishing, Swagger UI / Scalar, audit logging, observability, and the base UI shell with navigation, theming, and user/tenant management screens.

After Phase 2 the platform has no product features yet — but a developer can log in, navigate a clean shell, hit the API with a valid JWT or API key, and see audited writes in MongoDB.

2. Objectives

  • O2.1 — Multi-tenant authentication live (email+password+MFA, OIDC, SAML), session + API-key both work.
  • O2.2 — RBAC enforced at API and UI level; permissions seedable per tenant.
  • O2.3 — MongoDB connection, Mongoose models, repository pattern, transaction helpers all in place.
  • O2.4 — REST API v1 skeleton (tenants, users, roles, api-keys, audit, health) live with full OpenAPI documentation.
  • O2.5 — Swagger UI and Scalar served at /api/docs and /api/docs/scalar, auth-protected, with auto-generated client SDK published as @gp/atlas-sdk.
  • O2.6 — Audit log captures every mutating action; viewable in admin UI.
  • O2.7 — Observability live: structured logs, traces, metrics, health checks.
  • O2.8 — Base UI shell with nav, command palette, user menu, tenant switcher, theme toggle.

3. Scope

3.1 In-scope

  • Auth.js v5 with Credentials + OIDC + SAML providers; MFA TOTP; passkey support optional.
  • Tenant CRUD (provisioning); user CRUD (invite, deactivate); role/permission CRUD; API-key CRUD.
  • Repository layer with tenant guard and zod-validated input.
  • OpenAPI generation pipeline (Zod → OpenAPI 3.1).
  • Swagger UI + Scalar mounts (protected by session or API key).
  • SDK generation pipeline (openapi-typescript + openapi-fetch).
  • Audit log middleware (mongo-side change streams optional later).
  • Rate limiting (per IP, per API key, per user).
  • Pino structured logging; OpenTelemetry SDK + auto-instrumentation; Prometheus /api/metrics.
  • Base UI shell (sidebar, top bar, command-palette ⌘K, breadcrumbs, theme switcher, tenant switcher).
  • Admin screens: Users, Roles, API Keys, Audit Log, Settings.

3.2 Out-of-scope

  • Asset / WorkOrder / Telemetry / Dashboard / AI / Workflow / Mobile (later phases).
  • Data ingestion connectors (Phase 3).
  • Per-record ABAC for product entities (Phase 4 adds where needed).

4. Dependencies

  • Phase 0 (scaffold) and Phase 1 (architecture + ADRs + Zod schemas for User/Tenant/Role/API-Key).

5. Architecture & Design

5.1 Auth flow

┌─────────┐   POST /auth/credentials   ┌──────────────┐
│ Browser │ ─────────────────────────▶ │ Auth.js v5   │
└────┬────┘                            └──────┬───────┘
     │  set-cookie: session (JWT, httpOnly)   │
     │◀────────────────────────────────────────┘
     │
     │  GET /api/v1/me  (Authorization: Bearer <jwt>  OR  session cookie)
     ▼
┌──────────────────────────────────────────┐
│  withAuth middleware                     │
│  - verifies JWT or session               │
│  - resolves tenantId from session.tenant │
│  - attaches ctx = { user, tenant, roles }│
└──────────────────────────────────────────┘
                  │
                  ▼
         Route handler with ctx

API-key path is identical except the middleware resolves ctx from the keys collection.

5.2 Repository pattern

// Every repository takes a TenantContext and refuses to query without it.
export interface RepoCtx { tenantId: string; userId?: string; roles: string[] }

export interface AssetRepository {
  findById(ctx: RepoCtx, id: string): Promise<Asset | null>;
  list(ctx: RepoCtx, query: AssetQuery): Promise<Paginated<Asset>>;
  create(ctx: RepoCtx, input: AssetInput): Promise<Asset>;
  update(ctx: RepoCtx, id: string, patch: AssetPatch): Promise<Asset>;
  remove(ctx: RepoCtx, id: string): Promise<void>;
}

A guard helper auto-injects tenantId into every Mongo filter and throws if missing.

5.3 OpenAPI publishing

  • Single source of truth: Zod schemas in packages/schemas.
  • pnpm openapi:build script compiles to openapi.json.
  • /api/openapi.json serves the compiled spec.
  • /api/docs serves Swagger UI (themed, auth-protected).
  • /api/docs/scalar serves Scalar UI (richer, recommended for partners).
  • /api/v1/* paths only; deprecated routes return Deprecation + Sunset headers.
  • CI step verifies that every route handler is documented (lint rule).

5.4 Audit log

  • Middleware wraps every mutating route; writes to auditLog collection.
  • Fields: tenantId, actor (user or apiKey), action (verb + resource), target (id), before, after, diff, ip, ua, at, correlationId.
  • Read access requires audit.read permission.
  • Optional MongoDB Change Streams subscriber for real-time admin view.

5.5 Observability

  • Logs: Pino JSON to stdout; collector ships to vendor or Grafana Loki.
  • Traces: OpenTelemetry auto-instrumentation for HTTP + MongoDB + Auth.js; manual spans around domain services.
  • Metrics: Prometheus /api/metrics exposing request counts, durations, error rates, DB pool, queue depth.
  • Health: /api/health returns {status, version, deps:{mongo, redis, s3, ai}}.
  • SLOs documented per service in docs/observability/slo.md.

5.6 UI shell

  • Layout: collapsible left sidebar, top bar with tenant switcher + user menu + theme toggle + global search.
  • Navigation: declarative nav.config.ts driven by user permissions.
  • Command palette (⌘K): jump to any page, run actions, search records.
  • Theme: light, dark, plus per-tenant brand theme (logo, primary color, accent color) loaded at runtime from tenants.branding.
  • i18n: next-intl; English + Japanese at launch.

6. Detailed Specifications

6.1 API surface (v1 skeleton)

GET    /api/v1/health
GET    /api/v1/openapi.json
GET    /api/v1/me
GET    /api/v1/tenants/me                       (current tenant)
PATCH  /api/v1/tenants/me                       (update settings, branding)

GET    /api/v1/users                            (list)
POST   /api/v1/users                            (invite)
GET    /api/v1/users/:id
PATCH  /api/v1/users/:id
DELETE /api/v1/users/:id                        (soft delete / deactivate)

GET    /api/v1/roles
POST   /api/v1/roles
GET    /api/v1/roles/:id
PATCH  /api/v1/roles/:id
DELETE /api/v1/roles/:id

GET    /api/v1/api-keys
POST   /api/v1/api-keys                         (returns plaintext once)
DELETE /api/v1/api-keys/:id

GET    /api/v1/audit                            (paginated, filterable)
GET    /api/v1/audit/:id

GET    /api/v1/permissions                      (catalogue)

Every endpoint:

  • Has Zod schemas for request, query, response.
  • Is documented in OpenAPI.
  • Is rate-limited (defaults: 60 req/min/IP, 600 req/min/api-key, configurable).
  • Emits an audit entry if mutating.

6.2 Permissions catalogue (v1 seed)

tenant.read   tenant.update
user.read     user.invite     user.update     user.deactivate
role.read     role.create     role.update     role.delete
apikey.read   apikey.create   apikey.delete
audit.read
settings.read settings.update

System roles seeded per tenant: owner, admin, member, viewer. Custom roles allowed.

6.3 SDK

  • Package: @gp/atlas-sdk.
  • Generated from OpenAPI via openapi-typescript (types) + openapi-fetch (client).
  • Versioned independently; published to GitHub Packages.
  • Includes typed React Query hooks (useAssets, useWorkOrders, etc. — populated as endpoints land).

7. Implementation Tasks

Epic 2.A — Database & repositories

  • 2.A.1 Implement Mongo client singleton with connection pool + retry.
  • 2.A.2 Implement Mongoose models for User, Tenant, Role, ApiKey, Session, AuditLog, Setting.
  • 2.A.3 Implement RepoCtx and base repository class with tenant guard.
  • 2.A.4 Migration / seed script for first tenant + owner.
  • 2.A.5 Backup script + restore runbook (manual for now; automated in Phase 9).

Epic 2.B — Authentication

  • 2.B.1 Install Auth.js v5; configure providers (Credentials, OIDC, SAML).
  • 2.B.2 MFA TOTP enroll + verify; backup codes.
  • 2.B.3 Passkey (WebAuthn) optional pathway.
  • 2.B.4 Session JWT with refresh rotation.
  • 2.B.5 Invitation flow (email link, expiry, accept).
  • 2.B.6 Password reset flow.
  • 2.B.7 Session list + revoke UI in user settings.

Epic 2.C — Authorisation

  • 2.C.1 Permission catalogue + role-permission mapping.
  • 2.C.2 RBAC middleware (require permission(s) on route).
  • 2.C.3 Permission hook for client (usePermission('user.invite')).
  • 2.C.4 ABAC overlay scaffold (policy engine).

Epic 2.D — API & OpenAPI

  • 2.D.1 Implement all v1 skeleton endpoints (§6.1).
  • 2.D.2 Wire zod-to-openapi; /api/openapi.json live.
  • 2.D.3 Mount Swagger UI at /api/docs, Scalar at /api/docs/scalar.
  • 2.D.4 Generate @gp/atlas-sdk from OpenAPI; publish nightly to GitHub Packages.
  • 2.D.5 Lint rule that fails build if a route lacks OpenAPI annotation.
  • 2.D.6 Rate limiter middleware (token bucket; configurable per route).

Epic 2.E — Audit + Observability

  • 2.E.1 Audit middleware: capture diff (json-diff), persist; redact secrets.
  • 2.E.2 Audit UI: paginated list, filter by actor/action/target/date.
  • 2.E.3 OpenTelemetry SDK + collector wiring.
  • 2.E.4 Pino logger + correlation IDs (propagated via x-correlation-id).
  • 2.E.5 Prometheus metrics endpoint.
  • 2.E.6 Health endpoint with dep checks.

Epic 2.F — UI shell

  • 2.F.1 App layout (sidebar, top bar, content area).
  • 2.F.2 Tenant switcher (if user belongs to >1).
  • 2.F.3 User menu (profile, settings, sign-out).
  • 2.F.4 Theme toggle + brand theme loader from tenant settings.
  • 2.F.5 Command palette (⌘K) with navigation + actions.
  • 2.F.6 Notifications drawer (in-app toasts + persistent feed).
  • 2.F.7 Empty states, loading states, error boundaries.
  • 2.F.8 i18n scaffold (next-intl); English + Japanese resource files.

Epic 2.G — Admin screens

  • 2.G.1 Users list + invite + edit.
  • 2.G.2 Roles list + create + edit (with permission checkboxes).
  • 2.G.3 API Keys list + create + revoke (key shown once).
  • 2.G.4 Audit Log list with filters.
  • 2.G.5 Tenant Settings (name, branding, locale, timezone, security policies).

8. Acceptance Criteria

  • AC2.1 — A new tenant can be seeded; the seed creates an owner user with MFA-enrollable.
  • AC2.2 — Owner can invite an admin user; admin receives email link; on accept they enroll MFA.
  • AC2.3 — /api/openapi.json returns a valid OpenAPI 3.1 doc covering every implemented route.
  • AC2.4 — /api/docs (Swagger UI) and /api/docs/scalar both render the spec and allow "Try it out" with a session-issued or API-key token.
  • AC2.5 — Generated SDK is consumable: import { listUsers } from "@gp/atlas-sdk" works in a sample script.
  • AC2.6 — Audit log records every mutating call across the skeleton API; an admin can browse and filter it.
  • AC2.7 — Rate limits enforced (synthetic test exceeds limit and receives 429).
  • AC2.8 — Cross-tenant query attempt is rejected at repository layer (test included).
  • AC2.9 — UI shell renders correctly in light, dark, and brand theme; works at mobile widths.
  • AC2.10 — Command palette can navigate to any admin screen and trigger common actions.

9. Test Requirements

  • Unit: ≥80% coverage on services and repositories; mock the Mongo client.
  • Integration: Vitest against ephemeral Mongo (testcontainers) — auth flows, RBAC, audit, rate-limit, OpenAPI lint.
  • e2e (Playwright): login (with MFA), invite user, switch tenant, view audit, generate API key, call API with key.
  • Contract: API responses validated against OpenAPI in tests.
  • Performance: smoke load test — 200 concurrent users hitting /me, /users endpoints; p95 < 300 ms.
  • Security: OWASP ZAP baseline scan in CI; secret-scan in CI.
  • Accessibility: axe-core check on every UI route in e2e; no critical violations.

10. Documentation Requirements

  • docs/runbooks/auth.md (incident response, MFA reset, session revoke).
  • docs/runbooks/api_keys.md (issuance, rotation, revocation).
  • docs/api/quickstart.md (developer onboarding to API + SDK).
  • docs/architecture/audit_log.md.
  • docs/observability/slo.md (initial SLOs).
  • ADR-013: Rate-limiting strategy.
  • ADR-014: SDK distribution (GitHub Packages vs npm).

11. Sign-off Criteria (Gate G2)

  • All Acceptance Criteria met.
  • Security review of auth + RBAC + audit passed.
  • Engineering Lead, Backend Lead, Security Lead, Product Owner sign _gates/Gate_G2_signoff.md.
  • Tagged phase-2-v1.0.

12. Risks & Mitigations

RiskLIMitigation
Auth.js v5 patterns immature for SAML33Use WorkOS/equiv as the SAML provider; Auth.js as session shell.
OpenAPI generation drift vs reality33CI lint rule + contract tests; deny merge if drift.
Audit log write throughput bottleneck23Async write queue (BullMQ) for non-critical paths.
Multi-tenant misconfig leak15Dedicated test suite simulating cross-tenant access; deny by default.