Phase 5 + all remaining
This commit is contained in:
+44
-15
@@ -12,15 +12,19 @@ Audit-driven plan to (a) reduce 12 PM2 processes to 3 application servers + 1 au
|
||||
| 2 — Build shared `lib/` | **Complete** | Lives at `inventory-server/shared/` (see Deviations). `/verify` endpoint live on auth-server |
|
||||
| 3 — Convert auth-server + inventory-server to ESM | **Complete** | All 58 server-side files ESM; both services live under the ESM build for >24h. See Deviations #10–13 |
|
||||
| 4 — Build `dashboard-server` (the merge) | **Complete (live) — 2026-05-24** | Merged service running on :3015 under PM2; Caddy routes for klaviyo/meta/dashboard-analytics/typeform all reverse-proxy to it. Old per-vendor directories (`klaviyo-server`, `meta-server`, `google-server`, `typeform-server`) and their PM2 entries deleted post-cutover — ~1.27 GB reclaimed (largely duplicated `node_modules`). Phase 6.2 gates wired (meta_write, klaviyo_admin). See Deviations #16–19 |
|
||||
| 5 — Convert `acot-server` to ESM | Not started | |
|
||||
| 6 — Auth hardening | **Complete** | All in-process items live: rate-limit, JWT precondition, CORS lockdown, request-log, upload allowlist, `requirePermission` on sensitive routes, permissions seed migration. `authenticate()` is live on `/api/*`. 6.11 (audit logging) deferred — see Out of scope |
|
||||
| 5 — Convert `acot-server` to ESM | **Complete (live) — 2026-05-24** | All 11 files (server, db/connection, utils/{phoneAuth,timeUtils}, 7 routes) converted to ESM. PM2 reload clean; SPA-driven `/api/acot/events/*` continues 200 across cutover; phone-server `/api/acot/customers/by-phone` returns 200 with correct shared secret. Phase 6 patterns applied during conversion — see Deviation #24 |
|
||||
| 6 — Auth hardening | **Complete** | All in-process items live: rate-limit, JWT precondition, CORS lockdown, request-log, upload allowlist, `requirePermission` on sensitive routes, permissions seed migration. `authenticate()` live on `/api/*` (inventory-server, dashboard-server) and `/api/acot/*` (acot-server, added in Phase 5). 6.10 lt-wordlist token loaded via `--env-file` + rotated 2026-05-24 (Deviation #25). 6.11 (audit logging) deferred — see Out of scope |
|
||||
| **F1 — Frontend fetch wrapper** | **Complete (live) — 2026-05-23** | Wrappers at `inventory/src/utils/api.ts` (`apiFetch`) and `inventory/src/utils/apiClient.ts` (axios instance). 170 `fetch()` sites across 76 files migrated to `apiFetch`; 32 `axios.*` sites across 11 files migrated to `apiClient`. AuthContext `/login`+`/me`, App.tsx `/me`, and `services/apiv2.ts` (external PHP backend) intentionally left as raw `fetch`. Shipped alongside the Phase 3+6 pm2 reload |
|
||||
| 7 — Caddyfile final form | **Complete — applied 2026-05-24** | Final Caddyfile live at `/etc/caddy/Caddyfile` (forward_auth gate + per-vendor reverse_proxy to :3015). The `inventory-server/deploy/` staging folder was removed after apply — recreate from this doc if future changes are needed. Backup convention: `/etc/caddy/Caddyfile.bak.YYYY-MM-DD` |
|
||||
| 8 — ecosystem.config.cjs final form | **Complete — applied 2026-05-24** | Live PM2 list matches the spec below (5 apps + acot-phone-server + lt-wordlist-api = 7 processes). Includes Phase 6.4 JWT_SECRET shadow-override fix and 6.10 lt-wordlist token move. `inventory-server/deploy/` removed post-apply |
|
||||
|
||||
**Live PM2 process count: 7** (5 application apps — auth-server, inventory-server, chat-server, dashboard-server, acot-server — plus acot-phone-server + lt-wordlist-api). Down from 13 pre-refactor.
|
||||
|
||||
**All apply steps complete (2026-05-24).** The original sequencing (npm install → F1 ship → pm2 reload → env consolidation → vendor PM2 delete → ecosystem apply → Caddyfile apply) was executed in order. Remaining work is Phase 5 (acot-server ESM conversion) only.
|
||||
**All planned phases complete (2026-05-24).** Phase 5 was the last code-level deliverable; acot-server now runs as an ESM service with shared-lib `authenticate()` defense-in-depth.
|
||||
|
||||
**All originally planned phases shipped.** Two real gaps surfaced during Phase 5 verification — both closed 2026-05-24:
|
||||
- **Phase 6.10 lt-wordlist token rotation** — fixed via Node's native `--env-file` flag in the PM2 entry; token rotated to a fresh 32-byte hex value in `/opt/lt-wordlist-api/.env` (mode 0600). See Deviation #25.
|
||||
- **Phase 6.1 Caddy gate blocking acot-phone-server's customer lookups** — fixed by repointing `ACOT_API_URL` from the public host to `http://localhost:3012/api/acot`. See Deviation #26.
|
||||
|
||||
---
|
||||
|
||||
@@ -344,21 +348,25 @@ When all four vendors share a Redis client, a Redis hiccup affects all four. Mak
|
||||
|
||||
## Phase 5 — Convert `acot-server` to ESM (stays standalone)
|
||||
|
||||
Status: **Not started.** Largest single conversion (~5K LOC), but no merge involved.
|
||||
Status: **Complete (live) — 2026-05-24.** 11 files converted (server.js, db/connection.js, utils/{phoneAuth,timeUtils}.js, 7 route files — ~5.2K LOC). PM2 reload clean; SPA-driven `/api/acot/events/{projection,stats}` continues 200 across cutover; phone-server `/api/acot/customers/by-phone` returns 200 with correct `x-acot-api-key`. Per Deviation #13, `ssh2` is CJS-only → uses `import ssh2 from 'ssh2'; const { Client } = ssh2;`. Phase 6 patterns applied during conversion (Deviation #24).
|
||||
|
||||
### Special concern: ssh2 tunnel
|
||||
|
||||
`acot-server` opens an SSH tunnel via `ssh2` to access the production MySQL at `192.168.1.5:3309`. The tunnel must be:
|
||||
`acot-server` opens an SSH tunnel via `ssh2` to access the production MySQL at `192.168.1.5:3309`. Lifecycle today (preserved verbatim across the ESM conversion, not refactored):
|
||||
|
||||
- Established before the HTTP listener starts (so no requests fail with "no DB connection").
|
||||
- Re-established on disconnect (`ssh2` connection's `close` event → recreate).
|
||||
- Cleanly torn down on `SIGTERM`/`SIGINT` so PM2 restarts don't leak file descriptors.
|
||||
- **Lazy establishment.** No tunnel at startup; the first `getDbConnection()` call sets one up. HTTP listener comes up immediately without waiting for the tunnel. Acceptable — the first per-route request just pays the tunnel-creation latency once.
|
||||
- **Per-connection ssh client.** Each pooled MySQL connection owns its own `ssh2.Client`. Closing a connection closes its own SSH client.
|
||||
- **No reconnect on disconnect.** There is no `close` listener on the SSH client. If the SSH connection drops while the MySQL connection is pooled (not in use), the next caller that pops it will get a query failure. Circuit-breaker absorbs repeated failures (5 failures → 30s open). Mitigation acceptable for current call volume; revisit if SSH drops become observable in logs.
|
||||
- **SIGTERM/SIGINT teardown.** `server.close()` → `closeAllConnections()` ends MySQL connections and SSH clients in sequence. Confirmed clean during the Phase 5 cutover (`SIGTERM signal received: closing HTTP server` → 10 × `Closed pooled connection` → `All connections closed and pool reset` in PM2 logs).
|
||||
|
||||
Verify (or add) this lifecycle handling as part of the conversion. If it's already correct, conversion is mechanical; if not, this is a good moment to fix it.
|
||||
### Auth model (two flavors, intentional)
|
||||
|
||||
### Test strategy
|
||||
`server.js` mounts the customers router BEFORE the global `authenticate()` so the two auth schemes don't collide:
|
||||
|
||||
Same as inventory-server: start with PM2, smoke-test the most-used `/api/acot/*` endpoints, watch logs for unhandled rejection or tunnel-close events.
|
||||
- `/api/acot/customers/*` → `requirePhoneApiKey` (timing-safe `x-acot-api-key` check). Used by `acot-phone-server`.
|
||||
- everything else → JWT Bearer via `shared/auth/middleware.js authenticate()`. Used by the SPA.
|
||||
|
||||
This works at the in-process layer. The public path through Caddy is a separate issue — see Deviation #26.
|
||||
|
||||
---
|
||||
|
||||
@@ -828,19 +836,20 @@ These came up in the audit but aren't part of this refactor:
|
||||
|
||||
## Concrete deliverables
|
||||
|
||||
State as of 2026-05-24: everything below is **shipped** except Phase 5 (acot-server ESM conversion), which is the only remaining work item. Note: the "4 application PM2 processes" original target became **5** in execution because `chat-server` stayed standalone rather than being folded in — never a serious merge candidate (different DB, different protocol shape).
|
||||
State as of 2026-05-24: all planned phases are **shipped**. Note: the "4 application PM2 processes" original target became **5** in execution because `chat-server` stayed standalone rather than being folded in — never a serious merge candidate (different DB, different protocol shape).
|
||||
|
||||
- ✅ 5 application PM2 processes instead of 12 (auth-server, inventory-server, dashboard-server, acot-server, chat-server) — plus 2 unchanged (acot-phone-server, lt-wordlist-api) = 7 total.
|
||||
- ✅ All `/api/*`, `/chat-api/*`, and `/uploads/*` requests gated at Caddy (`forward_auth`) and re-verified at each upstream (`authenticate()`).
|
||||
- ✅ All `/api/*`, `/chat-api/*`, and `/uploads/*` requests gated at Caddy (`forward_auth`).
|
||||
- ✅ Per-upstream `authenticate()` re-verification on inventory-server, dashboard-server, and acot-server. (`chat-server` still relies on the Caddy gate alone — see asterisk below.)
|
||||
- ✅ Sensitive endpoints additionally gated by per-permission checks (`requirePermission`).
|
||||
- ⚠️ One ESM standard — done for auth/inventory/dashboard/chat. **acot-server still CJS (Phase 5 pending).**
|
||||
- ⚠️ **One ESM standard — done for auth/inventory/dashboard/acot.** `chat-server` is still CJS (the prior version of this document erroneously claimed it had been converted; verified 2026-05-24 — its `server.js` still uses `require()` and its `package.json` has no `"type": "module"`). Out of scope for this refactor; tracked as future work.
|
||||
- ✅ One shared `lib/` at `inventory-server/shared/` for auth, logging, DB, errors, CORS.
|
||||
- ✅ Login rate-limited (`shared/rate-limit/login.js`).
|
||||
- ✅ `JWT_SECRET` rotated + ecosystem shadow-override removed.
|
||||
- ✅ Old auth-server, Aircall, Gorgias, Clarity directories deleted from the repo. Defunct `dashboard:gorgias`/`dashboard:calls` permission rows also deleted from DB (2026-05-24).
|
||||
- ✅ Caddyfile slimmed to one auth-gated block.
|
||||
- ✅ Permission codes inserted into `permissions` table for granular authorization.
|
||||
- ✅ No half-finished pieces, no `// TODO: add auth later` comments, no deferred secrets cleanup.
|
||||
- ✅ No half-finished pieces remain. Both gaps surfaced during Phase 5 verification — `lt-wordlist-api` insecure default token (Deviation #25) and Caddy blocking acot-phone-server's `x-acot-api-key` calls (Deviation #26) — were closed 2026-05-24.
|
||||
|
||||
---
|
||||
|
||||
@@ -897,3 +906,23 @@ These are decisions made during Phase 1/2 implementation that amend the spec abo
|
||||
- `acot-phone-server` script is `/var/www/acot-phone/dist/server.js` (was `./inventory/acot-phone/server.js` in the live file — wrong; that path doesn't exist). `/var/www/acot-phone/` is matt:matt with its own `.env` and is a separate repo from inventory-server.
|
||||
|
||||
23. **Phase 6.10 ADD_WORD_TOKEN move stays in this ecosystem.** Per Deviation #22, `lt-wordlist-api` is in matt's ecosystem, so the §6.10 work to remove inline `ADD_WORD_TOKEN` and load it from `/opt/lt-wordlist-api/.env` instead is implemented directly in `deploy/ecosystem.config.cjs.proposed` (no inline `ADD_WORD_TOKEN`; script reads its own .env). When applying, rotate the token value in `/opt/lt-wordlist-api/.env` and update any callers.
|
||||
|
||||
24. **Phase 6 patterns applied to acot-server during Phase 5.** acot-server was originally planned to convert mechanically (require → import) and inherit Phase 6 hardening later. Done in a single pass instead: the new `server.js` mounts `shared/logging/request-log.js`, `shared/cors/policy.js`, `shared/errors/handler.js`, and `shared/auth/middleware.js`'s `authenticate()` on `/api/acot/*` (except the customers router — see Phase 5 auth-model section above). Adds `pg` dependency to `inventory-server/dashboard/acot-server/package.json` because the Postgres pool for `authenticate()`'s user/permission lookups is initialized in-process. Env layering follows dashboard-server's pattern: `/var/www/inventory/.env` loaded first (JWT_SECRET, DB_*), local `.env` loaded second (PROD_DB_*, PROD_SSH_*, ACOT_PHONE_API_KEY). No `acot_admin` permission gates wired — none of the routes mutate state in ways that warrant per-permission checks today; the seeded code in `migrations/005_phase6_permission_codes.sql` stays reserved.
|
||||
|
||||
25. **Phase 6.10 fixed 2026-05-24 via option (b).** Discovered during Phase 5 closeout that `/opt/lt-wordlist-api/.env` didn't exist, no `ADD_WORD_TOKEN` env var was set on the running process, and the script's `process.env.ADD_WORD_TOKEN || 'tokenhere'` fallback was the only gate — meaning `curl -X POST -H "x-add-word-token: tokenhere" http://localhost:3030/add-word` succeeded in production.
|
||||
|
||||
**Fix applied:**
|
||||
- Generated a fresh 32-byte hex token via `openssl rand -hex 32`.
|
||||
- Wrote it to `/opt/lt-wordlist-api/.env` (matt:matt, mode 0600).
|
||||
- Edited `/var/www/ecosystem.config.cjs`'s `lt-wordlist-api` entry to add `node_args: ['--env-file=/opt/lt-wordlist-api/.env']`. Node ≥20.6 (netcup runs v22.22.2) reads the file at startup with no script changes and no `dotenv` import — the cleanest of the three options because the token never lives in the committed ecosystem file.
|
||||
- `pm2 reload /var/www/ecosystem.config.cjs --only lt-wordlist-api --update-env` picked up the new wrapper config. PM2 restart count 1 → 2, clean startup.
|
||||
|
||||
**Verified:** old default `tokenhere` now returns `{"error":"unauthorized"}` HTTP 401; new env-file token returns `{"ok":true,...}` HTTP 200 on `/add-word` and `/delete-word`. To rotate again: edit `/opt/lt-wordlist-api/.env` + `pm2 restart lt-wordlist-api --update-env`.
|
||||
|
||||
**Caller coordination:** user confirmed all callers are external and will be updated as issues surface; no inventory of callers to pre-notify.
|
||||
|
||||
26. **Caddy `forward_auth` gate breaks acot-phone-server's customer lookups — fixed 2026-05-24 via option (a).** Phase 6.1 put `/api/acot/*` behind `forward_auth localhost:3011/verify`, which strictly requires a JWT Bearer token. But `acot-phone-server` (at `/var/www/acot-phone/`) calls `/api/acot/customers/by-phone`, `/api/acot/customers/search`, `/api/acot/customers/:cid/orders` using only an `x-acot-api-key` header (`/var/www/acot-phone/src/services/acotApi.ts:51`). Result: every customer lookup from the phone app hit a 401 at the Caddy gate before reaching `requirePhoneApiKey`. Last successful customer call in acot-server's access log was 2026-05-21 — three days before the Caddy cutover.
|
||||
|
||||
**Fix applied — option (a):** changed `ACOT_API_URL` in `/var/www/acot-phone/.env` (and `acot-phone-server/.env.example` and the local repo copy) from `https://tools.acherryontop.com/api/acot` to `http://localhost:3012/api/acot`. Both processes live on netcup, so the request never enters Caddy and lands directly on `requirePhoneApiKey` in-process. Restarted via `pm2 restart acot-phone-server --update-env`; smoke-tested with `curl -H "x-acot-api-key: ..." http://localhost:3012/api/acot/customers/by-phone?phone=...` → 200 with the real customer record.
|
||||
|
||||
(Alternative option (b), kept here for posterity: add a `@phone_auth header x-acot-api-key *` guard in the Caddyfile to bypass `forward_auth` for requests bearing the shared secret. Would have worked too but introduces a header-based bypass in the gate, which is a worse security posture than just not entering Caddy at all.)
|
||||
|
||||
Reference in New Issue
Block a user