Phase 3 + 6
This commit is contained in:
+139
-9
@@ -10,15 +10,18 @@ Audit-driven plan to (a) reduce 12 PM2 processes to 3 application servers + 1 au
|
||||
|---|---|---|
|
||||
| 1 — Decommission dead services | **Complete** | aircall/gorgias/clarity/legacy-auth-server deleted from repo + PM2 + Caddyfile + ecosystem.cjs |
|
||||
| 2 — Build shared `lib/` | **Complete** | Lives at `inventory-server/shared/` (see Deviations). `/verify` endpoint live on auth-server |
|
||||
| 3 — Convert auth-server + inventory-server to ESM | Not started | Next up. Auth-server is still CJS but has the new `/verify` route added in CJS form |
|
||||
| 3 — Convert auth-server + inventory-server to ESM | **Complete (code)** | All 58 server-side files ESM; verified 0 import failures on netcup. Pending: `npm install` on server + pm2 reload to actually run the new code. See Deviations #10–13 |
|
||||
| 4 — Build `dashboard-server` (the merge) | Not started | klaviyo/meta/google/typeform still run as 4 separate PM2 apps |
|
||||
| 5 — Convert `acot-server` to ESM | Not started | |
|
||||
| 6 — Auth hardening | Not started | Shared modules exist (`shared/rate-limit`, `shared/cors`, `shared/logging`) but no service consumes them yet. JWT_SECRET footgun discovered — see 6.4 |
|
||||
| 7 — Caddyfile final form | Partial | Dead routes removed; `forward_auth` gate + `/uploads/*` gating + per-vendor cleanup deferred to after Phase 4 |
|
||||
| 8 — ecosystem.config.cjs final form | Partial | Dead apps removed; final shape depends on Phase 4 merge |
|
||||
| 6 — Auth hardening | **Complete (code) — gated on Phase F1** | All in-process items wired (rate-limit, JWT precondition, CORS lockdown, request-log, upload allowlist, `requirePermission` on sensitive routes, permissions seed migration). `authenticate()` is live on `/api/*`. Server-side artefacts (Caddyfile, ecosystem.cjs) written to `inventory-server/deploy/` for review. 6.11 (audit logging) deferred. **Frontend cannot use the app until Phase F1 ships** — see below |
|
||||
| **F1 — Frontend fetch wrapper (NEW)** | **Not started — CRITICAL** | Frontend uses raw `fetch()` in ~220 sites; only 7 send `Authorization: Bearer`. With Phase 6's `authenticate()` middleware live, every refresh 401s until the frontend uniformly attaches the token. See "Phase F1" below |
|
||||
| 7 — Caddyfile final form | Partial | Proposed file at `inventory-server/deploy/Caddyfile.proposed`. Apply blocked on F1 (forward_auth would 401 every page load until then) |
|
||||
| 8 — ecosystem.config.cjs final form | Partial | Proposed at `inventory-server/deploy/ecosystem.config.cjs.proposed`. Includes Phase 6.4 JWT_SECRET footgun fix and 6.10 lt-wordlist token move |
|
||||
|
||||
**Live PM2 count: 10** (down from 13). Target after Phase 4: 5 application apps + acot-phone-server + lt-wordlist-api.
|
||||
|
||||
**Apply order from current state:** (a) `npm install` on netcup to install the new shared-module deps (`pino`, `pino-http`, `ioredis`, `express-rate-limit`, `jsonwebtoken`), (b) ship Phase F1 frontend fetch wrapper, (c) `pm2 reload inventory-server new-auth-server` (Phase 3+6 code goes live, requests carry tokens, app keeps working), (d) apply `deploy/ecosystem.config.cjs.proposed` (Phase 6.4 + 6.10), (e) apply `deploy/Caddyfile.proposed` (Phase 6.1 — edge gate).
|
||||
|
||||
---
|
||||
|
||||
## Goals
|
||||
@@ -199,7 +202,11 @@ Caddy's `forward_auth` only needs "is this token valid? give me a user-id." Toda
|
||||
|
||||
## Phase 3 — Convert `auth-server` and `inventory-server` to ESM
|
||||
|
||||
Status: **Not started.** Lift the easy ones first. These two stay standalone (don't merge into anything), so they're isolated changes. The auth-server's new `/verify` route (added in Phase 2) is currently CJS — refactor it during this phase to import from `../shared/auth/verify.js`.
|
||||
Status: **Complete (code) — 2026-05-23.** Both servers + all sub-trees converted to ESM. 58 importable .js files load cleanly on netcup (verified via dynamic-import sweep). Two latent bugs surfaced and fixed: `??`/`||` precedence in `shared/db/{pg,redis}.js`, and CJS named-import of `Pool` from `pg` in both auth files (now uses `import pg from 'pg'; const { Pool } = pg`).
|
||||
|
||||
Scripts under `inventory-server/scripts/` (one-shot maintenance / orchestrators) kept CommonJS via a sibling `scripts/package.json` declaring `"type": "commonjs"` — Node's package-type resolution walks up directory by directory, so this overrides the parent's `"type": "module"` without renaming any file or touching any `spawn()` callsite. Convert individual scripts to ESM if/when touched.
|
||||
|
||||
Pending to actually go live: `npm install` on netcup (new deps: `pino`, `pino-http`, `ioredis`, `express-rate-limit`, `jsonwebtoken`) + `pm2 reload`. See "Phase F1" — the frontend fetch wrapper should ship in the same deploy or this immediately breaks the app.
|
||||
|
||||
### Mechanical conversion
|
||||
|
||||
@@ -357,7 +364,23 @@ Same as inventory-server: start with PM2, smoke-test the most-used `/api/acot/*`
|
||||
|
||||
## Phase 6 — Auth hardening
|
||||
|
||||
Status: **Not started.** This is the security work that justifies the whole refactor. Runs in parallel with phases 3–5 where possible. Shared building blocks already exist (`shared/rate-limit/login.js`, `shared/cors/policy.js`, `shared/logging/request-log.js`, `shared/errors/handler.js`) — Phase 6 is about *applying* them per-service.
|
||||
Status: **Complete (code) — 2026-05-23. Application gated on Phase F1.** All in-process hardening shipped alongside the Phase 3 ESM conversion. The `authenticate()` middleware is wired live on `/api/*` in inventory-server — **the moment that code reaches production, the frontend stops working until Phase F1 lands**, because today's frontend doesn't include `Authorization: Bearer` on the vast majority of fetch calls (see Phase F1 below for the diagnosis).
|
||||
|
||||
Per-item status:
|
||||
|
||||
| # | Item | Status | Where |
|
||||
|---|---|---|---|
|
||||
| 6.1 | Caddy `forward_auth` gate | **Proposed** — apply *after* F1 | `inventory-server/deploy/Caddyfile.proposed` |
|
||||
| 6.2 | `requirePermission` on sensitive routes + permissions migration | **Done** | inline in `config.js`, `data-management.js`, `import.js`, `ai-prompts.js`, `ai-validation.js`, `templates.js`, `reusable-images.js`; codes seeded by `migrations/005_phase6_permission_codes.sql` |
|
||||
| 6.3 | Login rate-limit + `/verify` rate-limit | **Done** | `auth/server.js` uses `shared/rate-limit/login.js` (`loginLimiter`, `verifyLimiter`) |
|
||||
| 6.4 | JWT_SECRET as startup precondition + ecosystem footgun fix | **Done in code; proposed for ecosystem.cjs** | Both auth-server and inventory-server `process.exit(1)` if `JWT_SECRET` is unset. `inventory-server/deploy/ecosystem.config.cjs.proposed` removes the `JWT_SECRET: process.env.JWT_SECRET` override that was shadowing `.env` |
|
||||
| 6.5 | Structured request logging w/ redaction | **Done** | `shared/logging/request-log.js` (pino-http, redacts Authorization/Cookie); mounted in both `auth/server.js` and `src/server.js` |
|
||||
| 6.6 | CORS lockdown | **Done** | `src/middleware/cors.js` now re-exports `shared/cors/policy.js`. LAN wildcards (`192.168.*`, `10.*`) and `*` defaults gone |
|
||||
| 6.7 | Upload hardening | **Done** | Exact-match MIME+extension allowlist on `routes/import.js` and `routes/reusable-images.js`; dead `multer({ dest })` removed from `routes/products.js` (no upload route was using it — strongest hardening was deletion) |
|
||||
| 6.8 | Frontend token storage stays localStorage + XSS audit | **Audited** | Confirmed `dangerouslySetInnerHTML` is sanitized in `ProductEditor.tsx`. **Flagged: `ChatRoom.tsx:277,392` renders user-controlled chat content as raw HTML — real XSS vector, separate fix needed** |
|
||||
| 6.9 | Remove debug middleware | **Done** | The header-dumping `app.use((req,res,next)=>{ console.log(... req.headers ...) })` block removed from `src/server.js`. Replaced with `shared/logging/request-log.js` (which redacts). |
|
||||
| 6.10 | `lt-wordlist-api` token move | **Proposed for ecosystem.cjs** | `inventory-server/deploy/ecosystem.config.cjs.proposed` shows the entry without inline token; apply alongside rotating the secret value into `/opt/lt-wordlist-api/.env` |
|
||||
| 6.11 | Audit logging for sensitive ops | **Deferred** | Out of scope for this pass per user direction. Existing `import_audit_log` and `product_editor_audit_log` tables stay as-is; generic `system_audit_log` table + middleware is its own project |
|
||||
|
||||
### 6.1 Caddy `forward_auth` gate
|
||||
|
||||
@@ -494,8 +517,99 @@ Already have `import-audit-log` and `product-editor-audit-log` tables. Extend th
|
||||
|
||||
---
|
||||
|
||||
## Phase F1 — Frontend fetch wrapper (NEW — 2026-05-23)
|
||||
|
||||
Status: **Not started. CRITICAL. Blocks the Phase 3+6 deploy from being usable.**
|
||||
|
||||
### The discovery
|
||||
|
||||
While wiring `authenticate()` on `/api/*` in Phase 6.1/6.2, we audited the frontend's fetch usage and found:
|
||||
|
||||
- **7** call sites send `Authorization: Bearer ${token}` explicitly (all in `AuthContext.tsx` for `/me` + `/login`, plus a couple of `settings/*` pages).
|
||||
- **~220** other `fetch(...)` / `axios.*(...)` call sites across `inventory/src/services/`, `inventory/src/pages/`, `inventory/src/components/` send **no** Authorization header at all.
|
||||
- There is no global fetch wrapper, axios interceptor, or service-worker shim that injects the token.
|
||||
|
||||
Today this works because nothing on the server checks. Caddy currently has no `forward_auth` gate (Phase 6.1 is a Caddyfile change that hasn't shipped yet) and the previous inventory-server had no `authenticate()` middleware. The frontend's auth model was "you log in once to get the token; the token is checked only by `/me`; everything else is implicitly trusted at the network layer."
|
||||
|
||||
With Phase 6 code in production, **every page refresh 401s** on the first API call after the next pm2 reload. The user explicitly accepted this when authorising the Phase 6 work — but the fix is its own deliverable, and shipping Phase 3+6 to PM2 without F1 in the same window means an outage window measured in *however long F1 takes* (not minutes).
|
||||
|
||||
### Recommended approach
|
||||
|
||||
Add a single fetch wrapper at `inventory/src/utils/api.ts` (or similar) and migrate the ~220 call sites to use it. The wrapper:
|
||||
|
||||
1. Reads `localStorage.getItem('token')` on every call (cheap; localStorage is sync).
|
||||
2. Merges `Authorization: Bearer ${token}` into the request headers if a token exists.
|
||||
3. Intercepts 401 responses → fires `window.dispatchEvent(new Event('auth:logout'))` (a listener already exists in `AuthContext.tsx:117`) so the user gets bounced to `/login` cleanly instead of seeing broken pages.
|
||||
4. Preserves the existing call shape — `apiFetch(url, init)` should be a drop-in for `fetch(url, init)` so the migration is mechanical.
|
||||
|
||||
```ts
|
||||
// inventory/src/utils/api.ts (sketch)
|
||||
export async function apiFetch(input: RequestInfo | URL, init: RequestInit = {}): Promise<Response> {
|
||||
const token = localStorage.getItem('token');
|
||||
const headers = new Headers(init.headers);
|
||||
if (token && !headers.has('Authorization')) {
|
||||
headers.set('Authorization', `Bearer ${token}`);
|
||||
}
|
||||
const res = await fetch(input, { ...init, headers });
|
||||
if (res.status === 401 && token) {
|
||||
// Token expired or revoked — bounce to /login. AuthContext already listens.
|
||||
window.dispatchEvent(new Event('auth:logout'));
|
||||
}
|
||||
return res;
|
||||
}
|
||||
```
|
||||
|
||||
Same shape for axios:
|
||||
|
||||
```ts
|
||||
// inventory/src/utils/apiClient.ts (sketch)
|
||||
import axios from 'axios';
|
||||
export const apiClient = axios.create();
|
||||
apiClient.interceptors.request.use((config) => {
|
||||
const token = localStorage.getItem('token');
|
||||
if (token) config.headers.Authorization = `Bearer ${token}`;
|
||||
return config;
|
||||
});
|
||||
apiClient.interceptors.response.use(
|
||||
(r) => r,
|
||||
(err) => {
|
||||
if (err?.response?.status === 401) window.dispatchEvent(new Event('auth:logout'));
|
||||
return Promise.reject(err);
|
||||
},
|
||||
);
|
||||
```
|
||||
|
||||
### Migration plan
|
||||
|
||||
1. Land the two wrapper modules above. ~50 LOC total.
|
||||
2. Codemod or sed-loop: in `inventory/src/`, replace `fetch(` → `apiFetch(` (with the right import) and `axios.get/post/...` → `apiClient.get/post/...`. ~220 call sites — a half-day of careful find-and-replace plus per-page verification. Spot-check the ones with custom `Content-Type` (multipart uploads especially) so the wrapper doesn't clobber multipart boundaries.
|
||||
3. Leave the `AuthContext.tsx` `/login` and `/me` calls alone — they already work and migrating them adds no value.
|
||||
4. Run the SPA: log in, exercise Overview / Products / Analytics / Dashboard / etc. with browser devtools open watching for `Authorization` header on every `/api/*` request.
|
||||
|
||||
### Sequencing with Phase 3+6 deploy
|
||||
|
||||
**Two options:**
|
||||
|
||||
A) **Ship F1 first** (recommended). Frontend goes out with the wrapper; nothing changes server-side. Then `pm2 reload` Phase 3+6. Zero-downtime, zero broken-page window.
|
||||
|
||||
B) **Ship together.** F1 and Phase 3+6 land in the same deploy. Brief window (seconds) where the frontend has the wrapper but the server hasn't reloaded yet — wrapper just sends extra headers the old server ignores. Safe.
|
||||
|
||||
Do **not** ship Phase 3+6 first and F1 second. That gives a broken app for as long as F1 takes.
|
||||
|
||||
### Out of scope (kept on `localStorage`)
|
||||
|
||||
Per Phase 6.8, we're not migrating to httpOnly cookie auth. F1 is the minimum work to make the per-service `authenticate()` (Phase 6) actually usable. A future Phase F2 could move to cookies + CSRF double-submit, but that's a much larger change touching the AuthContext, the login flow, and every backend that reads tokens. Not justified for an internal tool with no public sign-up.
|
||||
|
||||
### Note on `/uploads/*` gating (Phase 6.7's Caddyfile change)
|
||||
|
||||
The proposed Caddyfile moves `/uploads/*` behind `forward_auth`. Most product images today are referenced from `<img src="/uploads/...">` in the SPA — those requests are made by the browser, which **does not include `Authorization` headers on image requests**. Fixing this is part of F1's scope too: either (a) keep `/uploads/*` public (revert that part of 6.7) and accept that uploaded images leak to anyone who guesses a URL, or (b) issue per-image signed URLs from the API and gate those at Caddy. Decide before applying the Caddyfile.
|
||||
|
||||
---
|
||||
|
||||
## Phase 7 — Caddyfile final form
|
||||
|
||||
Status: **Proposed (2026-05-23). Apply blocked on Phase F1.** The full proposed file lives at `inventory-server/deploy/Caddyfile.proposed` and matches the spec below except that vendor handle blocks still point to per-vendor PM2 apps (Phase 4 hasn't merged them yet). See `inventory-server/deploy/README.md` for the apply commands (admin-API + sudo cp pattern from Phase 2 deviation #8).
|
||||
|
||||
After all phases, the `tools.acherryontop.com` block looks like:
|
||||
|
||||
```caddyfile
|
||||
@@ -571,6 +685,8 @@ Removed: `/dashboard-auth/*`, `/api/aircall/*`, `/api/gorgias/*`, `/api/clarity/
|
||||
|
||||
## Phase 8 — ecosystem.config.cjs final form
|
||||
|
||||
Status: **Proposed (2026-05-23).** Full proposed file at `inventory-server/deploy/ecosystem.config.cjs.proposed`. Includes the Phase 6.4 `JWT_SECRET` shadow-override fix and the Phase 6.10 `lt-wordlist-api` token move. Still lists per-vendor PM2 apps until Phase 4 merge ships — that's the only thing keeping app count at 10 instead of the target 5.
|
||||
|
||||
```js
|
||||
module.exports = {
|
||||
apps: [
|
||||
@@ -646,10 +762,11 @@ Phase 1 unblocks everything (fewer services to convert).
|
||||
Phase 2 is the foundation; nothing else can start until shared `lib/` exists.
|
||||
Phases 3–5 can run in parallel; they touch independent services.
|
||||
Phase 6's sub-items can be developed alongside 3–5 but **enabled** only after them (no point adding `requirePermission` to a route that doesn't yet have `authenticate`).
|
||||
Phase 7 is the cutover: Caddyfile flip happens when all backend changes are deployed.
|
||||
**Phase F1 must precede the Phase 3+6 pm2 reload** — without the fetch wrapper, the moment the new code goes live the SPA breaks. Discovered during Phase 3+6 implementation; see Phase F1.
|
||||
Phase 7 is the cutover: Caddyfile flip happens after F1 ships AND after the `/uploads/*` gating decision in F1 is made.
|
||||
Phase 8 is cleanup: remove dead PM2 entries.
|
||||
|
||||
Estimated effort, end-to-end: **~3 weeks of focused work** by one engineer. Phase 1 ≈ 1 day, Phase 2 ≈ 2 days, Phase 3 ≈ 3 days (both services), Phase 4 ≈ 5–7 days (the merge), Phase 5 ≈ 2–3 days, Phase 6 ≈ 3–4 days, Phase 7+8 ≈ 1 day.
|
||||
Estimated effort, end-to-end: **~3 weeks of focused work** by one engineer. Phase 1 ≈ 1 day, Phase 2 ≈ 2 days, Phase 3 ≈ 3 days (both services), Phase 4 ≈ 5–7 days (the merge), Phase 5 ≈ 2–3 days, Phase 6 ≈ 3–4 days, Phase F1 ≈ 0.5–1 day, Phase 7+8 ≈ 1 day.
|
||||
|
||||
---
|
||||
|
||||
@@ -699,12 +816,13 @@ Each phase produces an independently deployable state. Rollback per phase:
|
||||
|
||||
These came up in the audit but aren't part of this refactor:
|
||||
|
||||
- `httpOnly` cookie auth (deferred — current `localStorage` acceptable for internal tool).
|
||||
- `httpOnly` cookie auth ("Phase F2" — deferred). Phase F1 keeps `localStorage` + Bearer header because that's the minimum to unblock the Phase 6 `authenticate()` rollout. A future move to cookie auth would touch `AuthContext`, every backend that reads tokens, and introduce CSRF concerns — much larger project.
|
||||
- Replacing PM2 with systemd or Docker.
|
||||
- Test coverage beyond the auth-critical surface.
|
||||
- `apiv2`/`apiv2-test` proxies to `backend.acherryontop.com` — separate system, not touched.
|
||||
- `acot-phone-server` and `lt-wordlist-api` — staying as-is.
|
||||
- Centralized observability stack (Prometheus, Grafana). The logger work in Phase 6.5 sets up the data, but shipping it somewhere is future work.
|
||||
- ChatRoom XSS remediation (flagged during Phase 6.8 audit — `inventory/src/components/chat/ChatRoom.tsx:277,392` renders user-controlled chat content via `dangerouslySetInnerHTML` without sanitization). Real vulnerability for an internal-but-multi-user tool; separate fix.
|
||||
|
||||
---
|
||||
|
||||
@@ -749,3 +867,15 @@ These are decisions made during Phase 1/2 implementation that amend the spec abo
|
||||
8. **Caddyfile changes via admin API on `:2020`.** The Caddyfile is owned by root and matt has no passwordless sudo. Cutover used `curl -X POST .../load` on the Caddy admin port (which matt can hit), then a separate `sudo cp /home/matt/Caddyfile.new /etc/caddy/Caddyfile` step to persist the on-disk file. Future Caddyfile changes can follow the same pattern. Backup convention: `/etc/caddy/Caddyfile.bak.YYYY-MM-DD`.
|
||||
|
||||
9. **Path-naming.** Plan uses `inventory/` as the top-level (server-side path convention). Locally the equivalent is `inventory-server/`. Whenever the plan says `inventory/dashboard/foo/`, read that as `/var/www/inventory/dashboard/foo/` on the server or `inventory-server/dashboard/foo/` locally.
|
||||
|
||||
10. **Scripts directory kept CJS via package.json shim.** Original plan called for converting "any spawned script" to ESM alongside its caller. Implemented: added `inventory-server/scripts/package.json` with `"type": "commonjs"`. Node's package-type resolution walks up directory by directory, so this overrides the parent's `"type": "module"` for the entire `scripts/` tree (≈15 files including `import/*.js`, `metrics-new/utils/*`, the orchestrator scripts) without renaming any file or touching any `spawn()` callsite. Convert individual scripts to ESM when touched; don't bulk-migrate.
|
||||
|
||||
11. **`src/routes/products.js` had dead multer setup.** Phase 6.7 spec called for hardening the upload route in products.js. There was no upload route — the `multer({ dest })` instance and `importProductsFromCSV` import were dead code left over from a long-ago migration. Strongest 6.7 hardening was deletion: no upload handler = no attack surface. The two real upload paths (`/api/import/upload-image` and `/api/reusable-images/upload`) got tightened MIME+extension allowlists instead.
|
||||
|
||||
12. **Two pre-existing syntax errors in shared/db/ surfaced.** `shared/db/pg.js:13` and `shared/db/redis.js:22` both had `?? Number(...) || N` — mixing `??` and `||` without parentheses is a TC39 syntax error. They passed Phase 2 because nothing imported them yet; Phase 3 smoke-test exposed it. Fixed with parens.
|
||||
|
||||
13. **`import { Pool } from 'pg'` doesn't work in ESM.** The `pg` package is CJS using `module.exports = { Pool, ... }`. Node's ESM-from-CJS interop fails to detect `Pool` as a named export via static analysis. The bulletproof pattern, now used everywhere: `import pg from 'pg'; const { Pool } = pg;`. Same idea for any future CJS-only deps. `src/utils/db.js` already had it; the two auth files needed the fix during execution.
|
||||
|
||||
14. **Frontend Bearer-header gap discovered (drives new Phase F1).** Phase 6 was specified assuming the frontend already sends `Authorization: Bearer` on every API call. It does not — only 7 of ~220 call sites do. Phase 6's `authenticate()` middleware is shipped and ready to enable, but until F1 lands the SPA will 401 on every page. The plan now has Phase F1 to address this explicitly; until then, the Phase 3+6 pm2 reload should not ship unless F1 ships in the same window.
|
||||
|
||||
15. **macOS NFS workflow note.** The `inventory-server/` directory locally is an NFS mount of `/var/www/inventory/` on netcup. Bulk operations (`find`/`grep -r`/mass `node --check`/`npm install`) hang or take minutes locally and pollute file listings with macOS AppleDouble `._*` sidecar files. Default to `ssh netcup` for any sweep across the tree — individual file edits via the editor are fine.
|
||||
|
||||
Reference in New Issue
Block a user