1142 lines
94 KiB
Markdown
1142 lines
94 KiB
Markdown
# Server Consolidation & Security Hardening Plan
|
||
|
||
Audit-driven plan to (a) reduce 13 PM2 processes to 5 application servers + 2 auxiliary processes (acot-phone-server, lt-wordlist-api) = 7 total, (b) put every API endpoint behind real authentication, and (c) standardize on ESM across all primary application Node services. Approach is "do it properly the first time" — no half-finished pieces, no deferred cleanup.
|
||
|
||
> **Note on the original 12→4 target.** The initial spec called for `12 PM2 processes → 3 application servers + 1 auth server` and "ESM across all Node services". During execution `chat-server` proved a poor merge candidate (different DB, different protocol shape) and was kept as its own process — see Deviations #16 and #27. Phase 9 (added post-audit, 2026-05-24) closes the residual gap: chat-server ESM conversion + in-process `authenticate()` + Caddyfile/CORS hardening + vitest scaffold.
|
||
|
||
---
|
||
|
||
## Status (2026-05-24)
|
||
|
||
| Phase | Status | Notes |
|
||
|---|---|---|
|
||
| 1 — Decommission dead services | **Complete** | aircall/gorgias/clarity/legacy-auth-server deleted from repo + PM2 + Caddyfile + ecosystem.cjs |
|
||
| 2 — Build shared `lib/` | **Complete** | Lives at `inventory-server/shared/` (see Deviations). `/verify` endpoint live on auth-server |
|
||
| 3 — Convert auth-server + inventory-server to ESM | **Complete** | All inventory-server + auth-server server-side files (58) ESM; both services live under the ESM build for >24h. `chat-server` intentionally not in scope — deferred to Phase 9. See Deviations #10–13, #27 |
|
||
| 4 — Build `dashboard-server` (the merge) | **Complete (live) — 2026-05-24** | Merged service running on :3015 under PM2; Caddy routes for klaviyo/meta/dashboard-analytics/typeform all reverse-proxy to it. Old per-vendor directories (`klaviyo-server`, `meta-server`, `google-server`, `typeform-server`) and their PM2 entries deleted post-cutover — ~1.27 GB reclaimed (largely duplicated `node_modules`). Phase 6.2 gates wired (meta_write, klaviyo_admin). See Deviations #16–19 |
|
||
| 5 — Convert `acot-server` to ESM | **Complete (live) — 2026-05-24** | All 11 files (server, db/connection, utils/{phoneAuth,timeUtils}, 7 routes) converted to ESM. PM2 reload clean; SPA-driven `/api/acot/events/*` continues 200 across cutover; phone-server `/api/acot/customers/by-phone` returns 200 with correct shared secret. Phase 6 patterns applied during conversion — see Deviation #24 |
|
||
| 6 — Auth hardening | **Complete** | All in-process items live: rate-limit, JWT precondition, CORS lockdown, request-log, upload allowlist, `requirePermission` on sensitive routes, permissions seed migration. `authenticate()` live on `/api/*` (inventory-server, dashboard-server) and `/api/acot/*` (acot-server, added in Phase 5). 6.10 lt-wordlist token loaded via `--env-file` + rotated 2026-05-24 (Deviation #25). 6.11 (audit logging) deferred — see Out of scope |
|
||
| **F1 — Frontend fetch wrapper** | **Complete (live) — 2026-05-23** | Wrappers at `inventory/src/utils/api.ts` (`apiFetch`) and `inventory/src/utils/apiClient.ts` (axios instance). 170 `fetch()` sites across 76 files migrated to `apiFetch`; 32 `axios.*` sites across 11 files migrated to `apiClient`. AuthContext `/login`+`/me`, App.tsx `/me`, and `services/apiv2.ts` (external PHP backend) intentionally left as raw `fetch`. Shipped alongside the Phase 3+6 pm2 reload |
|
||
| 7 — Caddyfile final form | **Complete — applied 2026-05-24, amended same day** | Final Caddyfile live at `/etc/caddy/Caddyfile` (forward_auth gate + per-vendor reverse_proxy to :3015). The `inventory-server/deploy/` staging folder was removed after the initial apply, then re-created during Phase 9 §9.2 and again for the Deviation #28 uploads-gate revert — treat it as the working area for any future Caddyfile change. Backup convention: `/etc/caddy/Caddyfile.bak.YYYY-MM-DD` |
|
||
| 8 — ecosystem.config.cjs final form | **Complete — applied 2026-05-24** | Live PM2 list matches the spec below (5 apps + acot-phone-server + lt-wordlist-api = 7 processes). Includes Phase 6.4 JWT_SECRET shadow-override fix and 6.10 lt-wordlist token move. `inventory-server/deploy/` removed post-apply |
|
||
| **9 — Post-audit residual gaps** | **Complete (live) — 2026-05-24** | Closes findings from second-look audit: chat-server ESM + in-process `authenticate()` + localhost bind, Caddyfile edge CORS tightening + CORS preflight bypass, vitest scaffold + auth-boundary tests, three Mini\*.jsx fetch leaks, and plan-doc goal reconciliation. PM2 reload, Caddy apply, frontend rebuild, and tests all completed. **§9.2's uploads-gate fix was subsequently reverted (2026-05-24) — see Deviation #28; `/uploads/*` is public-by-design because the external PHP backend at www.acherryontop.com fetches images during product import.** See Phase 9 section |
|
||
|
||
**Live PM2 process count: 7** (5 application apps — auth-server, inventory-server, chat-server, dashboard-server, acot-server — plus acot-phone-server + lt-wordlist-api). Down from 13 pre-refactor.
|
||
|
||
**Phases 1–8 complete (2026-05-24).** Phase 5 closed the last originally-planned code deliverable; acot-server now runs as an ESM service with shared-lib `authenticate()` defense-in-depth.
|
||
|
||
**Phase 9 added and applied 2026-05-24** after a second-look audit surfaced six residual findings — see [Phase 9](#phase-9--post-audit-residual-gaps) below. Code, PM2 reload, Caddy apply, frontend rebuild, and test verification are complete.
|
||
|
||
Two earlier gaps surfaced during Phase 5 verification, both closed 2026-05-24:
|
||
- **Phase 6.10 lt-wordlist token rotation** — fixed via Node's native `--env-file` flag in the PM2 entry; token rotated to a fresh 32-byte hex value in `/opt/lt-wordlist-api/.env` (mode 0600). See Deviation #25.
|
||
- **Phase 6.1 Caddy gate blocking acot-phone-server's customer lookups** — fixed by repointing `ACOT_API_URL` from the public host to `http://localhost:3012/api/acot`. See Deviation #26.
|
||
|
||
---
|
||
|
||
## Goals
|
||
|
||
- Every public-facing endpoint requires a valid auth token (Caddy gate + per-server middleware + per-route permission checks for sensitive operations).
|
||
- Reduce service count from 13 PM2 processes to 7 total: 5 application servers (`auth-server`, `inventory-server`, `dashboard-server`, `acot-server`, `chat-server`) + 2 auxiliary (`acot-phone-server`, `lt-wordlist-api`). The original "4 application servers" target was missed because `chat-server` proved a poor merge candidate during execution; see Deviation #16.
|
||
- Standardize on ESM (`"type": "module"`) across all primary application Node services. `chat-server` was the last holdout — Phase 9 converts it.
|
||
- Decommission `aircall-server`, `gorgias-server`, `clarity-server`, and the legacy `auth-server` (port 3003).
|
||
- Eliminate dependency duplication: one Redis client, one Postgres pool helper, one logger, one auth middleware — shared across services.
|
||
|
||
## Non-goals
|
||
|
||
- Rewriting business logic. Route handlers move as-is unless they break under ESM or shared middleware.
|
||
- Switching auth providers (we keep JWT + bcrypt + Postgres).
|
||
- Replacing PM2 or Caddy.
|
||
- Migrating Klaviyo/Meta/Google/Typeform's external API contracts.
|
||
|
||
---
|
||
|
||
## Target architecture
|
||
|
||
```
|
||
┌──────────────────────────┐
|
||
│ tools.acherryontop.com │
|
||
│ (Caddy) │
|
||
│ forward_auth gate ─────┼──► auth-server:3011
|
||
└────────────┬─────────────┘ /verify endpoint
|
||
│
|
||
┌────────────────────────────────┼────────────────────────────────┐
|
||
▼ ▼ ▼
|
||
┌─────────────────────┐ ┌──────────────────────┐ ┌─────────────────────┐
|
||
│ inventory-server │ │ dashboard-server │ │ acot-server │
|
||
│ :3010 (ESM) │ │ :3015 (ESM) │ │ :3012 (ESM) │
|
||
│ │ │ │ │ │
|
||
│ /api/products │ │ /api/klaviyo/* │ │ /api/acot/* │
|
||
│ /api/orders │ │ /api/meta/* │ │ (MySQL via SSH) │
|
||
│ /api/analytics │ │ /api/google-*/* │ │ │
|
||
│ /api/dashboard │ │ /api/typeform/* │ │ │
|
||
│ ... (~25 routers) │ │ │ │ │
|
||
└─────────────────────┘ └──────────────────────┘ └─────────────────────┘
|
||
│ │ │
|
||
├── Postgres (inventory_db) ├── Postgres (klaviyo) └── MySQL (workpi, via ssh2 tunnel)
|
||
├── shared lib/ ◄────────────────┤
|
||
│ - auth middleware ├── Redis (shared client)
|
||
│ - permission helper └── shared lib/ ◄─────────────────┐
|
||
│ - logger │
|
||
│ - pg pool factory │
|
||
│ - error formatter │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
│
|
||
┌──────────────────┴───┐
|
||
│ auth-server │
|
||
│ :3011 (ESM) │
|
||
│ /login, /me, │
|
||
│ /verify, user mgmt │
|
||
└──────────────────────┘
|
||
```
|
||
|
||
PM2 process count: **13 → 7** (5 application servers + `acot-phone-server` + `lt-wordlist-api`). The original target was 4 application servers; the diagram above omits `chat-server` because it was originally a merge candidate. In practice `chat-server` runs as its own ESM process post-Phase 9 — see Deviation #16/#27.
|
||
|
||
---
|
||
|
||
## Phase 1 — Decommission dead/leaving services
|
||
|
||
Status: **Complete (2026-05-23)**. All four services removed from repo, PM2, Caddyfile, and ecosystem.config.cjs. Frontend widgets (`AircallDashboard.jsx`, `GorgiasOverview.jsx`) and their dashboard.ts/Navigation.jsx/vite.config.ts wiring also removed. Verification: smoke-tested `https://tools.acherryontop.com/api/{aircall,gorgias,clarity}/*` → 404. Backups left at `/home/matt/{ecosystem.config.cjs,Caddyfile}.bak.2026-05-23`.
|
||
|
||
### To remove
|
||
|
||
| Service | Reason | Steps |
|
||
|---|---|---|
|
||
| `aircall-server` (3002) | Migrating off Aircall | `pm2 delete aircall-server`; remove from `ecosystem.config.cjs`; remove `/api/aircall/*` from Caddyfile; drop `inventory/dashboard/aircall-server/` directory; remove MongoDB connection from any frontend code; cancel Mongo if it was only feeding Aircall |
|
||
| `gorgias-server` (3006) | Migrating off Gorgias | same pattern; check frontend for `/api/gorgias/*` callers and delete the dashboards/widgets that use them |
|
||
| `clarity-server` (3009) | Already dead (no `.js` files, not in ecosystem) | remove `/api/clarity/*` from Caddyfile; delete `inventory/dashboard/clarity-server/` directory |
|
||
| `auth-server` (3003, legacy) | Replaced by `new-auth-server` on 3011 | grep entire codebase for `dashboard-auth` and `localhost:3003`; redirect or remove callers; `pm2 delete auth-server`; remove from ecosystem; remove `/dashboard-auth/*` from Caddyfile; delete `inventory/dashboard/auth-server/` directory |
|
||
|
||
### Verification before deletion
|
||
|
||
```bash
|
||
# from inventory/ root — find any references before removing
|
||
grep -rn "aircall\|/api/aircall" inventory/src/ inventory-server/src/
|
||
grep -rn "gorgias\|/api/gorgias" inventory/src/ inventory-server/src/
|
||
grep -rn "/dashboard-auth\|localhost:3003" inventory/src/ inventory-server/src/
|
||
grep -rn "/api/clarity" inventory/src/ inventory-server/src/
|
||
```
|
||
|
||
Any remaining callers must be deleted or repointed before the server is removed. Do **not** leave a 502 response in production.
|
||
|
||
### Database/secret cleanup
|
||
|
||
- Drop the MongoDB instance feeding Aircall (after confirming no other consumers).
|
||
- Rotate any Gorgias/Aircall API keys still in `.env` files (defense in depth — they'll be useless soon anyway, but commit hygiene matters).
|
||
- Remove `MONGODB_URI`, `AIRCALL_*`, `GORGIAS_*` from any `.env` files.
|
||
|
||
---
|
||
|
||
## Phase 2 — Build the shared `lib/`
|
||
|
||
Status: **Complete (2026-05-23)**. All 11 modules written under `inventory-server/shared/` (NOT repo root — see Deviations). `/verify` endpoint added to auth-server in CJS form (will move to shared/auth/verify.js usage during Phase 3 ESM conversion). Smoke-tested with no-token / bad-token / expired-token / valid-token cases. No service consumes shared/ yet; that happens in Phases 3–5.
|
||
|
||
### Location
|
||
|
||
A single shared directory at the repo root: `shared/` (sibling of `inventory/` and `acot-phone/`). Each service imports from it via a relative path. We do **not** introduce npm workspaces yet — relative imports are fine for three consumers and avoid the npm-link / hoisting headaches.
|
||
|
||
### Modules to create
|
||
|
||
```
|
||
shared/
|
||
├── package.json # "type": "module"
|
||
├── auth/
|
||
│ ├── middleware.js # authenticate(), requirePermission(), requireAdmin()
|
||
│ └── verify.js # verifyToken() — pure function, no Express dependency
|
||
├── db/
|
||
│ ├── pg.js # createPool(envPrefix) — returns configured Pool
|
||
│ └── redis.js # createRedis() — single client, lazy-connect
|
||
├── logging/
|
||
│ ├── logger.js # pino-based, redacts Authorization/Cookie
|
||
│ └── request-log.js # Express middleware, structured access log
|
||
├── errors/
|
||
│ └── handler.js # consistent error envelope, no leak in prod
|
||
├── cors/
|
||
│ └── policy.js # single allowed-origins list, exported as cors() options
|
||
└── rate-limit/
|
||
└── login.js # express-rate-limit config for /login
|
||
```
|
||
|
||
### Auth middleware spec (`shared/auth/middleware.js`)
|
||
|
||
```js
|
||
// Pseudocode — final implementation matches the existing pattern in
|
||
// inventory/auth/routes.js authenticate() but factored out.
|
||
|
||
export function authenticate({ pool }) {
|
||
return async (req, res, next) => {
|
||
const header = req.headers.authorization;
|
||
if (!header?.startsWith('Bearer ')) {
|
||
return res.status(401).json({ error: 'Authentication required' });
|
||
}
|
||
try {
|
||
const decoded = jwt.verify(header.slice(7), process.env.JWT_SECRET);
|
||
// Short-circuit DB hit with an in-memory cache, 60s TTL keyed by token jti
|
||
const user = await loadUserCached(pool, decoded.userId);
|
||
if (!user.is_active) return res.status(403).json({ error: 'Account inactive' });
|
||
req.user = user;
|
||
next();
|
||
} catch {
|
||
res.status(401).json({ error: 'Invalid token' });
|
||
}
|
||
};
|
||
}
|
||
|
||
export function requirePermission(code) {
|
||
return (req, res, next) => {
|
||
if (req.user.is_admin) return next();
|
||
if (req.user.permissions?.includes(code)) return next();
|
||
res.status(403).json({ error: 'Insufficient permissions' });
|
||
};
|
||
}
|
||
|
||
export const requireAdmin = (req, res, next) =>
|
||
req.user.is_admin ? next() : res.status(403).json({ error: 'Admin only' });
|
||
```
|
||
|
||
### Why a 60s in-memory user cache
|
||
|
||
`forward_auth` in Caddy will call `auth-server` on every request. Each per-server `authenticate()` middleware also has a DB lookup to load permissions. Without caching, every API request becomes 1 SQL query for the user row + 1 for permissions. 60s TTL is short enough that deactivating a user takes effect within a minute, long enough that Klaviyo dashboards (which fire dozens of requests on load) don't hammer Postgres.
|
||
|
||
### Add to `auth-server`: a `/verify` endpoint
|
||
|
||
Caddy's `forward_auth` only needs "is this token valid? give me a user-id." Today's `/me` does that but with a full permissions join. Add a lightweight `/verify` that:
|
||
|
||
- Verifies JWT signature only (no DB hit).
|
||
- Returns `200` with `X-User-Id` and `X-User-Is-Admin` response headers (which Caddy `copy_headers` will pass to upstream).
|
||
- Returns `401` on bad token.
|
||
|
||
**Decision: each service re-verifies the JWT independently.** Caddy's `forward_auth` is a fast first-pass reject for obviously bad tokens, but the security boundary is the per-server `authenticate()` middleware. Cost is negligible (one HMAC-SHA256 per request); the upside is that a misconfigured Caddyfile can never let an unauthenticated request reach a backend. Upstream services do **not** trust any `X-User-*` headers from Caddy — they parse the `Authorization` header themselves.
|
||
|
||
---
|
||
|
||
## Phase 3 — Convert `auth-server` and `inventory-server` to ESM
|
||
|
||
Status: **Complete (live) — 2026-05-24.** Both servers + all sub-trees converted to ESM and running under PM2. 58 importable .js files. Two latent bugs surfaced and fixed during the conversion: `??`/`||` precedence in `shared/db/{pg,redis}.js`, and CJS named-import of `Pool` from `pg` in both auth files (now uses `import pg from 'pg'; const { Pool } = pg`).
|
||
|
||
Scripts under `inventory-server/scripts/` (one-shot maintenance / orchestrators) kept CommonJS via a sibling `scripts/package.json` declaring `"type": "commonjs"` — Node's package-type resolution walks up directory by directory, so this overrides the parent's `"type": "module"` without renaming any file or touching any `spawn()` callsite. Convert individual scripts to ESM if/when touched.
|
||
|
||
Went live 2026-05-24 after `npm install` on netcup (new deps: `pino`, `pino-http`, `ioredis`, `express-rate-limit`, `jsonwebtoken`) + `pm2 reload`. Phase F1 (frontend fetch wrapper) shipped in the same window so the SPA continued to send `Authorization: Bearer` on every request as `authenticate()` came online.
|
||
|
||
### Mechanical conversion
|
||
|
||
Per service:
|
||
|
||
1. Add `"type": "module"` to `package.json`.
|
||
2. Convert `require()` → `import`. `module.exports` → `export` / `export default`.
|
||
3. Fix `__dirname`/`__filename` (use `import.meta.url` + `fileURLToPath`).
|
||
4. Convert any dynamic require (e.g., conditional plugin loading) to `await import()`.
|
||
5. Update any sub-imports that don't include the file extension — ESM requires `./foo.js`, not `./foo`.
|
||
6. Update `ecosystem.config.cjs` if any service entry depended on CJS semantics. The ecosystem file itself can stay `.cjs` — PM2 reads it as config, doesn't matter what the apps it spawns are.
|
||
7. Update nodemon config / scripts.
|
||
|
||
### Risk areas in inventory-server
|
||
|
||
- `routes/ai.js` does a lazy init (`aiRouter.initInBackground()` called from `server.js`) — confirm the export shape still works as a default export of an Express router with a sidecar function. May need to split into `export default router; export function initInBackground() {}`.
|
||
- Multer setup in `routes/import.js` — straightforward, no ESM-specific concerns.
|
||
- SSE setup in `server.js` — moves over cleanly, no module-system entanglement.
|
||
- The `child_process.spawn` calls for metrics calculation: ESM doesn't change `child_process` behavior, but if any spawned script uses `require()` of a sibling, that sibling must also be ESM (or stay CJS with a `.cjs` extension).
|
||
|
||
### Test strategy
|
||
|
||
- After conversion, `pm2 start ecosystem.config.cjs --only inventory-server` on the server, watch logs for require/import errors at startup.
|
||
- Hit `/health`, then the most exercised endpoints (`/api/products`, `/api/dashboard/overview`, `/api/analytics/...`). If startup is clean and three smoke endpoints work, ESM conversion is done. Functional correctness is preserved because no logic changed.
|
||
|
||
### Auth-server
|
||
|
||
Already small (~200 LOC server.js + ~few hundred in routes.js + permissions.js). 1-day conversion. Add the new `/verify` endpoint as part of this work.
|
||
|
||
---
|
||
|
||
## Phase 4 — Build `dashboard-server` (the merge)
|
||
|
||
Status: **Complete (live) — 2026-05-24.** Klaviyo + Meta + Google + Typeform merged into a single ESM service at `inventory-server/dashboard/server.js`. Shared Pool + ioredis client injected through router factories. Phase 6.2 permission gates wired (`meta_write` on Meta budget/status mutations; `klaviyo_admin` on Klaviyo `/events/clearCache`). Post-cutover cleanup (2026-05-24) deleted the four old per-vendor directories (`klaviyo-server`, `meta-server`, `google-server`, `typeform-server`) along with their PM2 entries — ~1.27 GB reclaimed, largely duplicated `node_modules` across vendors. Original boot test on netcup: `/health` 200; unauthenticated `/api/klaviyo/*` returns `{"error":"No token provided"}` HTTP 401 via shared `authenticate()`.
|
||
|
||
### Layout
|
||
|
||
```
|
||
inventory/dashboard/
|
||
├── server.js # entry: load env, init Postgres+Redis, mount routes, listen
|
||
├── package.json # "type": "module", deps from all 4 source servers (deduped)
|
||
├── .env # KLAVIYO_*, META_*, GOOGLE_*, TYPEFORM_*, shared DB_*, REDIS_URL
|
||
├── routes/
|
||
│ ├── klaviyo/ # absorbed from dashboard/klaviyo-server/src/
|
||
│ ├── meta/ # absorbed from dashboard/meta-server/
|
||
│ ├── google/ # absorbed from dashboard/google-server/
|
||
│ └── typeform/ # absorbed from dashboard/typeform-server/
|
||
├── services/ # per-vendor API clients (Klaviyo SDK calls, etc.)
|
||
├── scripts/
|
||
│ └── import-campaign-products.js # one-shot, moved from klaviyo-server/scripts/
|
||
└── logs/
|
||
```
|
||
|
||
### Mount points
|
||
|
||
```js
|
||
// server.js (sketch)
|
||
import { authenticate, requirePermission } from '../../shared/auth/middleware.js';
|
||
import { createPool } from '../../shared/db/pg.js';
|
||
import { createRedis } from '../../shared/db/redis.js';
|
||
import { logger, requestLog } from '../../shared/logging/index.js';
|
||
import corsPolicy from '../../shared/cors/policy.js';
|
||
import errorHandler from '../../shared/errors/handler.js';
|
||
|
||
import klaviyoRouter from './routes/klaviyo/index.js';
|
||
import metaRouter from './routes/meta/index.js';
|
||
import googleRouter from './routes/google/index.js';
|
||
import typeformRouter from './routes/typeform/index.js';
|
||
|
||
const app = express();
|
||
const pool = await createPool('KLAVIYO_DB'); // klaviyo has its own DB; others can share or have none
|
||
const redis = await createRedis();
|
||
|
||
app.use(requestLog);
|
||
app.use(cors(corsPolicy));
|
||
app.use(express.json({ limit: '10mb' }));
|
||
|
||
// Everything below this line requires a valid token.
|
||
app.use('/api', authenticate({ pool }));
|
||
|
||
app.use('/api/klaviyo', klaviyoRouter({ pool, redis }));
|
||
app.use('/api/meta', metaRouter({ redis }));
|
||
app.use('/api/google-analytics', googleRouter({ redis })); // matches Caddy /api/dashboard-analytics rewrite
|
||
app.use('/api/typeform', typeformRouter({ redis }));
|
||
|
||
app.get('/health', (req, res) => res.json({ ok: true }));
|
||
app.use(errorHandler);
|
||
|
||
app.listen(process.env.DASHBOARD_PORT || 3015);
|
||
```
|
||
|
||
### Per-vendor routers
|
||
|
||
Each vendor's existing route file becomes a factory that takes the shared `pool`/`redis` and returns an Express router. Replace each server's per-instance pool/redis with the injected one.
|
||
|
||
### Permission gates (sensitive routes only)
|
||
|
||
Authenticated-only is the default after `app.use('/api', authenticate(...))`. For sensitive operations, add `requirePermission` per route:
|
||
|
||
- Anything that mutates Klaviyo lists/segments → `requirePermission('klaviyo_write')`
|
||
- Triggering a campaign sync → `requirePermission('klaviyo_admin')`
|
||
- Read-only dashboards → no extra check beyond authenticate.
|
||
|
||
Define the new permission codes in the `permissions` table via a migration in Phase 6.
|
||
|
||
### Dependency dedup
|
||
|
||
**Decision: standardize on `ioredis`.** Klaviyo's larger codebase already uses it, and `ioredis` has better cluster/sentinel support if we ever need it. Update `meta`/`google`/`typeform` call sites — each is a handful of `get`/`set` calls, mechanical conversion. Remove the `redis` package from `dashboard-server`'s `package.json`.
|
||
|
||
### Env consolidation
|
||
|
||
Single `.env` at `inventory/dashboard/.env`, prefixed keys:
|
||
|
||
```
|
||
DASHBOARD_PORT=3015
|
||
KLAVIYO_API_KEY=...
|
||
KLAVIYO_DB_HOST=...
|
||
KLAVIYO_DB_NAME=...
|
||
META_ACCESS_TOKEN=...
|
||
GOOGLE_SERVICE_ACCOUNT_KEY=...
|
||
TYPEFORM_TOKEN=...
|
||
REDIS_URL=...
|
||
JWT_SECRET=... # shared with auth-server; same secret means same tokens valid here
|
||
```
|
||
|
||
### Klaviyo's `scripts/import-campaign-products.js`
|
||
|
||
One-shot script — keep it, but run it from the merged dashboard-server's directory. Update the script's imports to ESM. If it's run via cron, update the cron entry to the new path.
|
||
|
||
### Risk: shared error states
|
||
|
||
When all four vendors share a Redis client, a Redis hiccup affects all four. Make sure the connection has retry config (`ioredis` defaults are reasonable but verify) and that vendor routes degrade gracefully when Redis is unavailable (most use it as a cache, so cache-miss → fall through to upstream API is the right behavior).
|
||
|
||
---
|
||
|
||
## Phase 5 — Convert `acot-server` to ESM (stays standalone)
|
||
|
||
Status: **Complete (live) — 2026-05-24.** 11 files converted (server.js, db/connection.js, utils/{phoneAuth,timeUtils}.js, 7 route files — ~5.2K LOC). PM2 reload clean; SPA-driven `/api/acot/events/{projection,stats}` continues 200 across cutover; phone-server `/api/acot/customers/by-phone` returns 200 with correct `x-acot-api-key`. Per Deviation #13, `ssh2` is CJS-only → uses `import ssh2 from 'ssh2'; const { Client } = ssh2;`. Phase 6 patterns applied during conversion (Deviation #24).
|
||
|
||
### Special concern: ssh2 tunnel
|
||
|
||
`acot-server` opens an SSH tunnel via `ssh2` to access the production MySQL at `192.168.1.5:3309`. Lifecycle today (preserved verbatim across the ESM conversion, not refactored):
|
||
|
||
- **Lazy establishment.** No tunnel at startup; the first `getDbConnection()` call sets one up. HTTP listener comes up immediately without waiting for the tunnel. Acceptable — the first per-route request just pays the tunnel-creation latency once.
|
||
- **Per-connection ssh client.** Each pooled MySQL connection owns its own `ssh2.Client`. Closing a connection closes its own SSH client.
|
||
- **No reconnect on disconnect.** There is no `close` listener on the SSH client. If the SSH connection drops while the MySQL connection is pooled (not in use), the next caller that pops it will get a query failure. Circuit-breaker absorbs repeated failures (5 failures → 30s open). Mitigation acceptable for current call volume; revisit if SSH drops become observable in logs.
|
||
- **SIGTERM/SIGINT teardown.** `server.close()` → `closeAllConnections()` ends MySQL connections and SSH clients in sequence. Confirmed clean during the Phase 5 cutover (`SIGTERM signal received: closing HTTP server` → 10 × `Closed pooled connection` → `All connections closed and pool reset` in PM2 logs).
|
||
|
||
### Auth model (two flavors, intentional)
|
||
|
||
`server.js` mounts the customers router BEFORE the global `authenticate()` so the two auth schemes don't collide:
|
||
|
||
- `/api/acot/customers/*` → `requirePhoneApiKey` (timing-safe `x-acot-api-key` check). Used by `acot-phone-server`.
|
||
- everything else → JWT Bearer via `shared/auth/middleware.js authenticate()`. Used by the SPA.
|
||
|
||
This works at the in-process layer. The public path through Caddy is a separate issue — see Deviation #26.
|
||
|
||
---
|
||
|
||
## Phase 6 — Auth hardening
|
||
|
||
Status: **Complete (live) — 2026-05-24.** All hardening (in-process + edge) is live in production. The Phase 3 ESM conversion + Phase 6 middleware shipped together, with Phase F1 (frontend fetch wrapper) flipping immediately ahead of the `pm2 reload` so the SPA continued to carry `Authorization: Bearer` on every API call. Caddy `forward_auth` gate and the JWT_SECRET ecosystem fix went live with the Phase 7/8 apply on 2026-05-24.
|
||
|
||
Per-item status:
|
||
|
||
| # | Item | Status | Where |
|
||
|---|---|---|---|
|
||
| 6.1 | Caddy `forward_auth` gate | **Live — 2026-05-24** | Applied via Caddy admin API + `sudo cp` to `/etc/caddy/Caddyfile`. `@gated path /api/* /chat-api/*` block hits `localhost:3011/verify` on every request. (`/uploads/*` was originally in the gate but removed per Deviation #28 — see Phase 6.7 / 9.2.) |
|
||
| 6.2 | `requirePermission` on sensitive routes + permissions migration | **Done** | inline in `config.js`, `data-management.js`, `import.js`, `ai-prompts.js`, `ai-validation.js`, `templates.js`, `reusable-images.js`; codes seeded by `migrations/005_phase6_permission_codes.sql`. **Phase 4 follow-on (2026-05-23):** `meta_write` wired on `PATCH /api/meta/campaigns/:id/budget` and `POST /api/meta/campaigns/:id/:action`; `klaviyo_admin` wired on `POST /api/klaviyo/events/clearCache`. Read-only Google + Typeform endpoints stay authenticated-only (reserved write codes left in migration 005 for future) |
|
||
| 6.3 | Login rate-limit + `/verify` rate-limit | **Done** | `auth/server.js` uses `shared/rate-limit/login.js` (`loginLimiter`, `verifyLimiter`) |
|
||
| 6.4 | JWT_SECRET as startup precondition + ecosystem footgun fix | **Live — 2026-05-24** | Both auth-server and inventory-server `process.exit(1)` if `JWT_SECRET` is unset. The `JWT_SECRET: process.env.JWT_SECRET` override that was shadowing `.env` is removed from the live ecosystem.cjs |
|
||
| 6.5 | Structured request logging w/ redaction | **Done** | `shared/logging/request-log.js` (pino-http, redacts Authorization/Cookie); mounted in both `auth/server.js` and `src/server.js` |
|
||
| 6.6 | CORS lockdown | **Done** | `src/middleware/cors.js` now re-exports `shared/cors/policy.js`. LAN wildcards (`192.168.*`, `10.*`) and `*` defaults gone |
|
||
| 6.7 | Upload hardening | **Partial — gating reverted** | Exact-match MIME+extension allowlist on `routes/import.js` and `routes/reusable-images.js`; dead `multer({ dest })` removed from `routes/products.js` (no upload route was using it — strongest hardening was deletion). The Caddyfile `/uploads/*` gate **was applied then reverted (2026-05-24)** — uploads are public-by-design (external PHP backend fetches them during product import). See Deviation #28 |
|
||
| 6.8 | Frontend token storage stays localStorage + XSS audit | **Audited** | Confirmed `dangerouslySetInnerHTML` is sanitized in `ProductEditor.tsx`. **Flagged: `ChatRoom.tsx:277,392` renders user-controlled chat content as raw HTML — real XSS vector, separate fix needed** |
|
||
| 6.9 | Remove debug middleware | **Done** | The header-dumping `app.use((req,res,next)=>{ console.log(... req.headers ...) })` block removed from `src/server.js`. Replaced with `shared/logging/request-log.js` (which redacts). |
|
||
| 6.10 | `lt-wordlist-api` token move | **Live — 2026-05-24** | Live PM2 entry runs `/opt/lt-wordlist-api/index.js` under matt's daemon; `ADD_WORD_TOKEN` is no longer inline in ecosystem.cjs and is read from `/opt/lt-wordlist-api/.env`. See Deviations #21–23 for the path corrections and the (incorrect) earlier assumption that this app lived under a separate root daemon |
|
||
| 6.11 | Audit logging for sensitive ops | **Deferred** | Out of scope for this pass per user direction. Existing `import_audit_log` and `product_editor_audit_log` tables stay as-is; generic `system_audit_log` table + middleware is its own project |
|
||
|
||
### 6.1 Caddy `forward_auth` gate
|
||
|
||
Add to the `tools.acherryontop.com` block, before the `@api_routes` handler:
|
||
|
||
```caddyfile
|
||
# Forward-auth gate for all API traffic
|
||
@needs_auth path /api/* /chat-api/*
|
||
handle @needs_auth {
|
||
forward_auth localhost:3011 {
|
||
uri /verify
|
||
copy_headers Authorization
|
||
# On 401/403, Caddy returns the auth-server's response body verbatim
|
||
}
|
||
# Existing per-vendor handle blocks remain below this line
|
||
}
|
||
|
||
# /auth-inv/* stays public (you need to log in!)
|
||
handle /auth-inv/* {
|
||
uri strip_prefix /auth-inv
|
||
reverse_proxy localhost:3011
|
||
}
|
||
```
|
||
|
||
The `forward_auth` directive subrequests `/verify` on the auth-server. If it returns 2xx, the request proceeds upstream. If 401/403, Caddy returns that response to the client and never hits the backend.
|
||
|
||
This is the **first** line of defense. Per-server middleware (`shared/auth/middleware.js`) is the **second** line — re-verifies the JWT independently. Defense in depth: a Caddyfile typo can't open a hole.
|
||
|
||
### 6.2 Per-route permission gates
|
||
|
||
After per-server `authenticate()`, add `requirePermission(code)` to destructive or sensitive routes. Audit needed in:
|
||
|
||
- `inventory-server/src/routes/config.js` — global config writes → `admin`
|
||
- `inventory-server/src/routes/import.js` — uploads, deletes, generate-upc → `product_import`
|
||
- `inventory-server/src/routes/data-management.js` — CSV operations → `data_management`
|
||
- `inventory-server/src/routes/ai-prompts.js` — prompt edits → `ai_admin`
|
||
- `inventory-server/src/routes/templates.js` — template writes → `templates_write`
|
||
- `inventory-server/src/routes/reusable-images.js` — image management → `image_admin`
|
||
- `inventory-server/src/routes/products.js` — only one POST (`/resolve-identifiers`); evaluate whether it needs a permission code or authenticated-only is fine
|
||
- `inventory-server/src/routes/product-editor-audit-log.js` and `import-audit-log.js` — read-only by sensitive users → `audit_read`
|
||
- `dashboard-server` Klaviyo/Meta/Google/Typeform write endpoints → vendor-specific codes per above
|
||
|
||
Migration: a single SQL script that inserts the new permission codes into the `permissions` table and assigns them to existing admin users. Non-admin users get permissions explicitly granted via the user management UI.
|
||
|
||
```sql
|
||
INSERT INTO permissions (code, name) VALUES
|
||
('product_import', 'Product Import'),
|
||
('data_management', 'Data Management'),
|
||
('ai_admin', 'AI Settings Admin'),
|
||
('templates_write', 'Template Editing'),
|
||
('image_admin', 'Image Management'),
|
||
('audit_read', 'Audit Log Access'),
|
||
('klaviyo_write', 'Klaviyo Write'),
|
||
('klaviyo_admin', 'Klaviyo Admin'),
|
||
('meta_write', 'Meta Write'),
|
||
('google_write', 'Google Analytics Write'),
|
||
('typeform_write', 'Typeform Write'),
|
||
('acot_admin', 'ACOT Server Admin')
|
||
ON CONFLICT (code) DO NOTHING;
|
||
```
|
||
|
||
### 6.3 Rate limiting on login
|
||
|
||
`shared/rate-limit/login.js`:
|
||
|
||
```js
|
||
import rateLimit from 'express-rate-limit';
|
||
export const loginLimiter = rateLimit({
|
||
windowMs: 15 * 60 * 1000, // 15 minutes
|
||
max: 10, // 10 attempts per IP per window
|
||
message: { error: 'Too many login attempts, try again later' },
|
||
standardHeaders: true,
|
||
legacyHeaders: false,
|
||
});
|
||
```
|
||
|
||
Apply in `auth-server` on the `/login` route. Consider also rate-limiting `/verify` and `/me` (much higher cap, ~600/min — they're called legitimately by every page load).
|
||
|
||
### 6.4 JWT secret rotation
|
||
|
||
- Rotate `JWT_SECRET` to a fresh 32-byte random string as part of the deployment.
|
||
- Document that rotation logs out all users — acceptable for an internal tool, do it during off-hours.
|
||
- Add `JWT_SECRET` to the env var validation block in `auth-server/server.js` (refuse to start if not set).
|
||
- **Fix the existing footgun**: `/var/www/ecosystem.config.cjs` currently has `JWT_SECRET: process.env.JWT_SECRET` *after* `...inventoryEnv` in the new-auth-server block. This shadows the `.env` value with whatever the shell exported when PM2 was started — which has already silently diverged at least once (detected and fixed 2026-05-23 by a clean PM2 restart in a shell without JWT_SECRET exported). Delete that override line during rotation; let `.env` be the single source of truth.
|
||
|
||
### 6.5 Request logging
|
||
|
||
`shared/logging/request-log.js` — log method, path, status, duration, user-id (if authenticated). **Never** log `Authorization` or `Cookie` headers. Remove the current `server.js:79-87` debug middleware in inventory-server (it logs full headers including the bearer token).
|
||
|
||
### 6.6 CORS lockdown
|
||
|
||
Current `middleware/cors.js` allows `192.168.*.*` and `10.*.*.*` with `credentials: true`. Tighten to explicit known origins:
|
||
|
||
```js
|
||
origin: [
|
||
'https://tools.acherryontop.com',
|
||
'https://inventory.kent.pw',
|
||
/^http:\/\/localhost:(5174|5175)$/,
|
||
]
|
||
```
|
||
|
||
If anyone genuinely needs LAN access, add their specific IP, not a `/16` range.
|
||
|
||
### 6.7 Upload hardening
|
||
|
||
`POST /api/import/upload-image` (multer-backed) needs:
|
||
|
||
- File-size limit set on multer config (current limit may be defaulted — verify).
|
||
- MIME-type allowlist (image/jpeg, image/png, image/webp; reject everything else).
|
||
- Filename sanitization (no `..`, no absolute paths, generate UUID-based names server-side).
|
||
- ~~The Caddy `/uploads/*` handler currently serves any file in the uploads directory publicly. Move this **behind** the auth gate: include `/uploads/*` in `@needs_auth`.~~ **Reverted — see Deviation #28.** Uploads are public-by-design because the external PHP backend at `www.acherryontop.com` fetches image URLs server-to-server during the `/apiv2/product/setup_new` import flow, and has no Bearer token. Random-suffix filenames (timestamp + `Math.random`) provide weak enumeration resistance; `/uploads/products/*` files also auto-delete after 24h. If genuinely sensitive uploads ever land here, implement HMAC-signed URLs rather than re-gating wholesale.
|
||
|
||
### 6.8 Frontend token storage
|
||
|
||
**Decision: stay on `localStorage`.** This is an internal tool with no untrusted user-generated HTML being rendered, so the XSS-token-theft surface is small. The `forward_auth` gate is the main security gap we're addressing; cookie-based auth would be a larger, separate project (cookie-parser, CSRF double-submit pattern, AuthContext refactor) that doesn't change the threat model for an internal tool with no public sign-up.
|
||
|
||
Sanity check during this refactor: grep the React codebase for `dangerouslySetInnerHTML`. If any usages exist, verify each one is rendering trusted (server-controlled, not user-supplied) content. If a user-supplied content path exists, that's a real XSS vector and needs separate remediation regardless of token-storage choice.
|
||
|
||
### 6.9 Remove debug middleware
|
||
|
||
[inventory-server/src/server.js:79-87](inventory-server/src/server.js#L79-L87) logs full request headers including `Authorization`. Delete this block. Replace with `shared/logging/request-log.js`.
|
||
|
||
### 6.10 `lt-wordlist-api` token
|
||
|
||
`ADD_WORD_TOKEN` is currently hardcoded in `/var/www/ecosystem.config.cjs`. Move to `/opt/lt-wordlist-api/.env`, rotate the token value, update any callers.
|
||
|
||
### 6.11 Audit logging for sensitive operations
|
||
|
||
Already have `import-audit-log` and `product-editor-audit-log` tables. Extend the pattern:
|
||
|
||
- Log `user_id`, `endpoint`, `params`, `result` for `config.js` writes and `data-management.js` operations.
|
||
- Schema: reuse the existing audit table pattern or add a generic `system_audit_log` table.
|
||
- Don't log request bodies wholesale (may contain large blobs); log the action and the target ID.
|
||
|
||
---
|
||
|
||
## Phase F1 — Frontend fetch wrapper (NEW — 2026-05-23)
|
||
|
||
Status: **Complete (live) — 2026-05-24.** Two wrappers landed at `inventory/src/utils/api.ts` and `inventory/src/utils/apiClient.ts`. Migration touched 87 files (76 fetch, 11 axios) covering ~200 call sites. Type-check clean; production build clean. Intentional exclusions: AuthContext `/login`+`/me` (own auth flow), App.tsx initial `/me` session check, and `services/apiv2.ts` (calls the separate PHP backend at backend.acherryontop.com which has its own cookie auth, out of scope per the plan). Shipped alongside the Phase 3+6 pm2 reload.
|
||
|
||
### The discovery
|
||
|
||
While wiring `authenticate()` on `/api/*` in Phase 6.1/6.2, we audited the frontend's fetch usage and found:
|
||
|
||
- **7** call sites send `Authorization: Bearer ${token}` explicitly (all in `AuthContext.tsx` for `/me` + `/login`, plus a couple of `settings/*` pages).
|
||
- **~220** other `fetch(...)` / `axios.*(...)` call sites across `inventory/src/services/`, `inventory/src/pages/`, `inventory/src/components/` send **no** Authorization header at all.
|
||
- There is no global fetch wrapper, axios interceptor, or service-worker shim that injects the token.
|
||
|
||
Today this works because nothing on the server checks. Caddy currently has no `forward_auth` gate (Phase 6.1 is a Caddyfile change that hasn't shipped yet) and the previous inventory-server had no `authenticate()` middleware. The frontend's auth model was "you log in once to get the token; the token is checked only by `/me`; everything else is implicitly trusted at the network layer."
|
||
|
||
With Phase 6 code in production, **every page refresh 401s** on the first API call after the next pm2 reload. The user explicitly accepted this when authorising the Phase 6 work — but the fix is its own deliverable, and shipping Phase 3+6 to PM2 without F1 in the same window means an outage window measured in *however long F1 takes* (not minutes).
|
||
|
||
### Recommended approach
|
||
|
||
Add a single fetch wrapper at `inventory/src/utils/api.ts` (or similar) and migrate the ~220 call sites to use it. The wrapper:
|
||
|
||
1. Reads `localStorage.getItem('token')` on every call (cheap; localStorage is sync).
|
||
2. Merges `Authorization: Bearer ${token}` into the request headers if a token exists.
|
||
3. Intercepts 401 responses → fires `window.dispatchEvent(new Event('auth:logout'))` (a listener already exists in `AuthContext.tsx:117`) so the user gets bounced to `/login` cleanly instead of seeing broken pages.
|
||
4. Preserves the existing call shape — `apiFetch(url, init)` should be a drop-in for `fetch(url, init)` so the migration is mechanical.
|
||
|
||
```ts
|
||
// inventory/src/utils/api.ts (sketch)
|
||
export async function apiFetch(input: RequestInfo | URL, init: RequestInit = {}): Promise<Response> {
|
||
const token = localStorage.getItem('token');
|
||
const headers = new Headers(init.headers);
|
||
if (token && !headers.has('Authorization')) {
|
||
headers.set('Authorization', `Bearer ${token}`);
|
||
}
|
||
const res = await fetch(input, { ...init, headers });
|
||
if (res.status === 401 && token) {
|
||
// Token expired or revoked — bounce to /login. AuthContext already listens.
|
||
window.dispatchEvent(new Event('auth:logout'));
|
||
}
|
||
return res;
|
||
}
|
||
```
|
||
|
||
Same shape for axios:
|
||
|
||
```ts
|
||
// inventory/src/utils/apiClient.ts (sketch)
|
||
import axios from 'axios';
|
||
export const apiClient = axios.create();
|
||
apiClient.interceptors.request.use((config) => {
|
||
const token = localStorage.getItem('token');
|
||
if (token) config.headers.Authorization = `Bearer ${token}`;
|
||
return config;
|
||
});
|
||
apiClient.interceptors.response.use(
|
||
(r) => r,
|
||
(err) => {
|
||
if (err?.response?.status === 401) window.dispatchEvent(new Event('auth:logout'));
|
||
return Promise.reject(err);
|
||
},
|
||
);
|
||
```
|
||
|
||
### Migration plan
|
||
|
||
1. Land the two wrapper modules above. ~50 LOC total.
|
||
2. Codemod or sed-loop: in `inventory/src/`, replace `fetch(` → `apiFetch(` (with the right import) and `axios.get/post/...` → `apiClient.get/post/...`. ~220 call sites — a half-day of careful find-and-replace plus per-page verification. Spot-check the ones with custom `Content-Type` (multipart uploads especially) so the wrapper doesn't clobber multipart boundaries.
|
||
3. Leave the `AuthContext.tsx` `/login` and `/me` calls alone — they already work and migrating them adds no value.
|
||
4. Run the SPA: log in, exercise Overview / Products / Analytics / Dashboard / etc. with browser devtools open watching for `Authorization` header on every `/api/*` request.
|
||
|
||
### Sequencing with Phase 3+6 deploy
|
||
|
||
**Two options:**
|
||
|
||
A) **Ship F1 first** (recommended). Frontend goes out with the wrapper; nothing changes server-side. Then `pm2 reload` Phase 3+6. Zero-downtime, zero broken-page window.
|
||
|
||
B) **Ship together.** F1 and Phase 3+6 land in the same deploy. Brief window (seconds) where the frontend has the wrapper but the server hasn't reloaded yet — wrapper just sends extra headers the old server ignores. Safe.
|
||
|
||
Do **not** ship Phase 3+6 first and F1 second. That gives a broken app for as long as F1 takes.
|
||
|
||
### Out of scope (kept on `localStorage`)
|
||
|
||
Per Phase 6.8, we're not migrating to httpOnly cookie auth. F1 is the minimum work to make the per-service `authenticate()` (Phase 6) actually usable. A future Phase F2 could move to cookies + CSRF double-submit, but that's a much larger change touching the AuthContext, the login flow, and every backend that reads tokens. Not justified for an internal tool with no public sign-up.
|
||
|
||
### Note on `/uploads/*` gating (Phase 6.7's Caddyfile change)
|
||
|
||
**Applied as-spec then reverted (2026-05-24):** `/uploads/*` was initially placed behind `forward_auth` per the 6.7 spec. The SPA was patched via [`AuthedImage`](inventory/src/components/ui/authed-image.tsx) to send Bearer headers, but the gate broke the **external** product-import flow: image URLs from `/uploads/products/*` and `/uploads/reusable/*` are submitted to the external PHP backend (`www.acherryontop.com`) via `/apiv2/product/setup_new`, which then fetches them server-to-server to ingest into the shop — and has no Bearer token. The escape hatch from the original note ("revert this part of 6.7 to keep `/uploads/*` public") was taken: `/uploads/*` now serves via a dedicated public `file_server` placed ahead of `@static` and `@gated`. See Deviation #28 for the full revert details. `AuthedImage` is left in place as defense-in-depth (Bearer header is harmless when not required); no rush to rip out.
|
||
|
||
---
|
||
|
||
## Phase 7 — Caddyfile final form
|
||
|
||
Status: **Complete — applied 2026-05-24.** Final Caddyfile live at `/etc/caddy/Caddyfile`; vendor handles point at the merged dashboard-server on :3015. The `inventory-server/deploy/` staging folder (which held `Caddyfile.proposed` and the README of apply commands) was originally removed after apply — **re-created 2026-05-24 to stage the §9.2 edits and again to stage the Deviation #28 uploads-gate revert; treat it as the working area for any future Caddyfile change**. Apply pattern (admin-API `curl -X POST :2020/load` + `sudo cp` to persist on-disk) is captured in Deviation #8. Backup convention: `/etc/caddy/Caddyfile.bak.YYYY-MM-DD`.
|
||
|
||
After all phases (including Phase 9 §9.2 and the Deviation #28 revert), the `tools.acherryontop.com` block looks like:
|
||
|
||
```caddyfile
|
||
tools.acherryontop.com {
|
||
import security_headers
|
||
|
||
# Public: login endpoint
|
||
handle /auth-inv/* {
|
||
uri strip_prefix /auth-inv
|
||
reverse_proxy localhost:3011
|
||
}
|
||
|
||
# Phase 9 §9.2: CORS preflight bypass (OPTIONS has no Authorization header)
|
||
@cors_preflight {
|
||
method OPTIONS
|
||
header Access-Control-Request-Method *
|
||
}
|
||
handle @cors_preflight {
|
||
handle /api/klaviyo/* { reverse_proxy localhost:3015 }
|
||
handle /api/meta/* { reverse_proxy localhost:3015 }
|
||
handle /api/dashboard-analytics/* { reverse_proxy localhost:3015 }
|
||
handle /api/typeform/* { reverse_proxy localhost:3015 }
|
||
handle /api/acot/* { reverse_proxy localhost:3012 }
|
||
handle /chat-api/* {
|
||
uri strip_prefix /chat-api
|
||
reverse_proxy localhost:3014
|
||
}
|
||
handle /api/* { reverse_proxy localhost:3010 }
|
||
}
|
||
|
||
# Deviation #28: /uploads/* is PUBLIC by design (external PHP backend
|
||
# fetches image URLs server-to-server during /apiv2 product import).
|
||
# Rooted at /var/www/inventory (NOT the SPA build root) so paths
|
||
# resolve to the real uploads directory. Placed ahead of @static and
|
||
# @gated so it wins the match.
|
||
handle /uploads/* {
|
||
root * /var/www/inventory
|
||
file_server
|
||
}
|
||
|
||
# Public: static frontend assets
|
||
@static path *.js *.css *.png *.jpg *.jpeg *.gif *.ico *.svg *.woff *.woff2
|
||
handle @static {
|
||
header Cache-Control "public, max-age=2592000"
|
||
root * /var/www/inventory/frontend/build
|
||
file_server
|
||
}
|
||
|
||
# All API + chat: auth gate first
|
||
@gated path /api/* /chat-api/*
|
||
handle @gated {
|
||
forward_auth localhost:3011 {
|
||
uri /verify
|
||
copy_headers Authorization
|
||
}
|
||
|
||
# Vendor dashboard routes → merged dashboard-server
|
||
handle /api/klaviyo/* { reverse_proxy localhost:3015 }
|
||
handle /api/meta/* { reverse_proxy localhost:3015 }
|
||
handle /api/dashboard-analytics/* { reverse_proxy localhost:3015 }
|
||
handle /api/typeform/* { reverse_proxy localhost:3015 }
|
||
|
||
# ACOT-specific
|
||
handle /api/acot/* { reverse_proxy localhost:3012 }
|
||
|
||
# Chat
|
||
handle /chat-api/* {
|
||
uri strip_prefix /chat-api
|
||
reverse_proxy localhost:3014
|
||
}
|
||
|
||
# Catch-all: inventory-server
|
||
handle /api/* { reverse_proxy localhost:3010 }
|
||
}
|
||
|
||
handle /health { reverse_proxy localhost:3010 }
|
||
|
||
# SPA fallback
|
||
handle {
|
||
root * /var/www/inventory/frontend/build
|
||
try_files {path} /index.html
|
||
file_server
|
||
encode gzip
|
||
}
|
||
|
||
handle_errors {
|
||
respond "{err.status_code} {err.status_text}"
|
||
}
|
||
}
|
||
```
|
||
|
||
Removed: `/dashboard-auth/*`, `/api/aircall/*`, `/api/gorgias/*`, `/api/clarity/*`, the LAN/`Access-Control-Allow-Origin "*"` permissive defaults on `/api/*`. Kept: `/apiv2/*` and `/apiv2-test/*` proxies to backend.acherryontop.com (out of scope, separate system).
|
||
|
||
---
|
||
|
||
## Phase 8 — ecosystem.config.cjs final form
|
||
|
||
Status: **Complete — applied 2026-05-24.** Live PM2 list matches the spec below: 5 application apps (auth-server, inventory-server, dashboard-server, acot-server, chat-server) plus acot-phone-server + lt-wordlist-api = 7 total. Includes the Phase 6.4 `JWT_SECRET` shadow-override fix and the Phase 6.10 `lt-wordlist-api` token move. The `inventory-server/deploy/ecosystem.config.cjs.proposed` staging file was removed after apply — recreate from the spec below if future changes are needed.
|
||
|
||
```js
|
||
module.exports = {
|
||
apps: [
|
||
{
|
||
name: 'auth-server',
|
||
script: './inventory/auth/server.js',
|
||
cwd: '/var/www',
|
||
env: { NODE_ENV: 'production', AUTH_PORT: 3011 },
|
||
...commonSettings,
|
||
},
|
||
{
|
||
name: 'inventory-server',
|
||
script: './inventory/src/server.js',
|
||
cwd: '/var/www',
|
||
env: { NODE_ENV: 'production', PORT: 3010, UPLOADS_DIR: '/var/www/inventory/uploads' },
|
||
...commonSettings,
|
||
},
|
||
{
|
||
name: 'dashboard-server',
|
||
script: './inventory/dashboard/server.js',
|
||
cwd: '/var/www',
|
||
env: { NODE_ENV: 'production', DASHBOARD_PORT: 3015 },
|
||
...commonSettings,
|
||
},
|
||
{
|
||
name: 'acot-server',
|
||
script: './inventory/dashboard/acot-server/server.js',
|
||
cwd: '/var/www',
|
||
env: { NODE_ENV: 'production', ACOT_PORT: 3012 },
|
||
...commonSettings,
|
||
},
|
||
{
|
||
name: 'chat-server',
|
||
script: './inventory/chat/server.js',
|
||
cwd: '/var/www',
|
||
env: { NODE_ENV: 'production', PORT: 3014 },
|
||
...commonSettings,
|
||
},
|
||
// acot-phone-server and lt-wordlist-api unchanged
|
||
],
|
||
};
|
||
```
|
||
|
||
Five entries instead of twelve. Each app loads its own `.env` from its directory (already handled by `dotenv.config`).
|
||
|
||
---
|
||
|
||
## Sequencing & dependencies
|
||
|
||
```
|
||
Phase 1 (decommission) ──┬─────────────────────────────────────────┐
|
||
│ │
|
||
▼ │
|
||
Phase 2 (shared lib/) │
|
||
│ │
|
||
┌──────────────┼──────────────┐ │
|
||
▼ ▼ ▼ ▼
|
||
Phase 3a Phase 3b Phase 4 Phase 6 (auth hardening
|
||
inventory-server auth-server dashboard-server runs alongside 3+4+5,
|
||
to ESM to ESM + /verify build & test completes after them)
|
||
│ │ │ │
|
||
└──────────────┼──────────────┘ │
|
||
▼ │
|
||
Phase 5 (acot-server to ESM) ──────────────────►│
|
||
▼
|
||
Phase 7 (Caddy cutover)
|
||
│
|
||
▼
|
||
Phase 8 (PM2 final state)
|
||
```
|
||
|
||
Phase 1 unblocks everything (fewer services to convert).
|
||
Phase 2 is the foundation; nothing else can start until shared `lib/` exists.
|
||
Phases 3–5 can run in parallel; they touch independent services.
|
||
Phase 6's sub-items can be developed alongside 3–5 but **enabled** only after them (no point adding `requirePermission` to a route that doesn't yet have `authenticate`).
|
||
**Phase F1 must precede the Phase 3+6 pm2 reload** — without the fetch wrapper, the moment the new code goes live the SPA breaks. Discovered during Phase 3+6 implementation; see Phase F1.
|
||
Phase 7 is the cutover: Caddyfile flip happens after F1 ships AND after the `/uploads/*` gating decision in F1 is made.
|
||
Phase 8 is cleanup: remove dead PM2 entries.
|
||
|
||
Estimated effort, end-to-end: **~3 weeks of focused work** by one engineer. Phase 1 ≈ 1 day, Phase 2 ≈ 2 days, Phase 3 ≈ 3 days (both services), Phase 4 ≈ 5–7 days (the merge), Phase 5 ≈ 2–3 days, Phase 6 ≈ 3–4 days, Phase F1 ≈ 0.5–1 day, Phase 7+8 ≈ 1 day.
|
||
|
||
---
|
||
|
||
## Phase 9 — Post-audit residual gaps (NEW — 2026-05-24)
|
||
|
||
A second-look audit after Phase 5/8 closed surfaced six findings that the original plan either deferred, missed, or overstated. Phase 9 closes them. Code work landed in the same session as the audit, then the deploy-side steps (`caddy reload`, `pm2 reload chat-server`, frontend rebuild) were applied and smoke-tested.
|
||
|
||
The intent is **"do it properly the first time"** — Phase 9 isn't a quality-of-life backlog, it's the final closing-out of the consolidation project. With §9.1–9.5 applied, the consolidation/security/ESM claims in this document match the live system.
|
||
|
||
### 9.1 — chat-server: ESM conversion + in-process `authenticate()` + localhost bind
|
||
|
||
**Finding:** `chat-server` was the last application service still using `require()` (Status note misstated this in earlier revisions) and had no per-server auth — it relied entirely on the Caddy gate. `localhost:3014/test-db` returns 200 unauthenticated, falling short of the plan's three-layer defense model (Caddy `forward_auth` + per-server `authenticate()` + per-route `requirePermission`).
|
||
|
||
**Files touched (code landed this session):**
|
||
- `inventory-server/chat/package.json` — add `"type": "module"`.
|
||
- `inventory-server/chat/server.js` — `require` → `import`; load shared `.env` first then local; add a *second* `Pool` against `inventory_db` (the existing `CHAT_DB_*` pool stays for `rocketchat_converted`); mount `shared/auth/middleware.js`'s `authenticate()` on the router; mount `shared/logging/request-log.js`, `shared/cors/policy.js`, `shared/errors/handler.js` for parity with dashboard-server; bind to `127.0.0.1` instead of `0.0.0.0` (external access is via Caddy only).
|
||
- `inventory-server/chat/routes.js` — `require` → `import`; `module.exports` → `export default`; no changes to handler bodies (they continue to read `global.pool` which `server.js` still sets).
|
||
|
||
**Applied / verified:**
|
||
1. `pm2 reload chat-server --update-env` completed; PM2 shows `chat-server` online with the new PID.
|
||
2. Syntax sweep passes under Node when excluding macOS `._*` sidecar files.
|
||
3. `ss -ltnp` shows `chat-server` bound to `127.0.0.1:3014` (not `0.0.0.0`).
|
||
4. Smoke:
|
||
- `curl -s http://localhost:3014/test-db` → `401` (was: 200 unauthenticated).
|
||
- valid short-lived Bearer token against `https://tools.acherryontop.com/chat-api/test-db` → `200`.
|
||
- `curl -s https://tools.acherryontop.com/chat-api/test-db` (no token) → `401` at Caddy gate.
|
||
5. PM2 logs show logged-in SPA Chat Archive requests returning 200 (`/users`, `/users/:id/rooms`, `/rooms/:id/messages`, `/messages/attachments`).
|
||
|
||
**Rollback:** `pm2 reload chat-server` against the previous commit (CJS version is one git checkout away).
|
||
|
||
### 9.2 — Caddyfile: uploads gate fix + edge CORS tightening
|
||
|
||
> **⚠️ The uploads-gate portion of this section was reverted on 2026-05-24 — see Deviation #28.** Edit #1 (`not path /uploads/*` on `@static`) was undone and `/uploads/*` was lifted out of `@gated` entirely. The CORS edits (#2 and #3) remain live. The narrative below is preserved as a record of the original intent and what was learned.
|
||
|
||
**Findings (original):**
|
||
- `/uploads/*.jpg` returns a *public* `404` because the `@static path *.jpg ... *.woff2` matcher beats `@gated path /api/* /chat-api/* /uploads/*` on specificity, hitting the unauthenticated SPA build root.
|
||
- `Access-Control-Allow-Origin "*"` is set inside the `security_headers` snippet and imported into `tools.acherryontop.com`, which silently overrides the careful allow-list in `shared/cors/policy.js`.
|
||
- CORS preflight (`OPTIONS` with no Authorization) hits `forward_auth` and 401s before the backend's CORS handler ever sees it.
|
||
|
||
**Applied from:** `inventory-server/deploy/Caddyfile.proposed` (recreated this session per Deviation #18's convention of staging diffs there before applying). The active `/etc/caddy/Caddyfile` now contains the Phase 9 CORS edits + the Deviation #28 uploads-gate revert.
|
||
|
||
**Three edits (as originally applied):**
|
||
1. ~~Tighten `@static`:~~ **REVERTED — see Deviation #28.**
|
||
```
|
||
@static {
|
||
path *.js *.css *.png *.jpg *.jpeg *.gif *.ico *.svg *.woff *.woff2
|
||
not path /uploads/*
|
||
}
|
||
```
|
||
~~This makes `/uploads/*` an exclusive @gated path, restoring the intended authenticated file_server behavior.~~ Current state: the `not path /uploads/*` exclusion is removed; `/uploads/*` is handled by a dedicated public `handle /uploads/*` block placed ahead of both `@static` and `@gated`. See Deviation #28.
|
||
2. Drop the wildcard headers from `security_headers`:
|
||
```diff
|
||
- Access-Control-Allow-Origin "*"
|
||
- Access-Control-Allow-Methods "GET, POST, OPTIONS, PUT, DELETE, PATCH"
|
||
- Access-Control-Allow-Headers "DNT, X-CustomHeader, ..."
|
||
```
|
||
CORS responses come from the upstreams (`shared/cors/policy.js`) which set ACAO conditionally against the allow-list. Caddy stamping `*` was a holdover from before Phase 6.6.
|
||
3. Add a preflight-bypass handler **before** `@gated`:
|
||
```
|
||
@cors_preflight {
|
||
method OPTIONS
|
||
header Access-Control-Request-Method *
|
||
}
|
||
handle @cors_preflight {
|
||
handle /api/klaviyo/* { reverse_proxy localhost:3015 }
|
||
handle /api/meta/* { reverse_proxy localhost:3015 }
|
||
handle /api/dashboard-analytics/* { reverse_proxy localhost:3015 }
|
||
handle /api/typeform/* { reverse_proxy localhost:3015 }
|
||
handle /api/acot/* { reverse_proxy localhost:3012 }
|
||
handle /chat-api/* {
|
||
uri strip_prefix /chat-api
|
||
reverse_proxy localhost:3014
|
||
}
|
||
handle /api/* { reverse_proxy localhost:3010 }
|
||
}
|
||
```
|
||
|
||
**Applied / verified (at time of original §9.2 cutover):**
|
||
1. Caddy admin API load applied and `/etc/caddy/Caddyfile` persisted with the Phase 9 edits.
|
||
2. Smoke (post-§9.2, pre-Deviation #28 revert):
|
||
- `curl -I https://tools.acherryontop.com/uploads/reusable/<real>.jpg` → `401` without auth (was: public 404).
|
||
- same path with a valid short-lived Bearer token → `200 image/jpeg`.
|
||
- `curl -X OPTIONS -H "Origin: https://tools.acherryontop.com" -H "Access-Control-Request-Method: GET" https://tools.acherryontop.com/api/products` → `204` with `Access-Control-Allow-Origin: https://tools.acherryontop.com` (specific, **not** `*`).
|
||
- same preflight with `Origin: https://evil.example` → `403` with no wildcard ACAO.
|
||
- `curl -s https://tools.acherryontop.com/api/products` (no token) → `401`; wildcard ACAO no longer stamped by Caddy.
|
||
|
||
**Current smoke (post-Deviation #28 revert, 2026-05-24):**
|
||
- `curl -I https://tools.acherryontop.com/uploads/reusable/<real>.jpg` → `200 image/jpeg` without auth (intentional — see Deviation #28).
|
||
- `curl -I https://tools.acherryontop.com/uploads/products/<nonexistent>.jpg` → `404` (not `401`) — confirms the gate is no longer in front of `/uploads/*`.
|
||
- `curl -s https://tools.acherryontop.com/api/products` (no token) → `401` (gate still up for `/api/*` + `/chat-api/*`).
|
||
- CORS preflight and origin-allowlist behavior unchanged from the post-§9.2 smoke above.
|
||
|
||
**Rollback:** restore `/etc/caddy/Caddyfile.bak.YYYY-MM-DD` and `curl -X POST .../load` against it.
|
||
|
||
### 9.3 — Frontend Mini\*.jsx fetch leaks
|
||
|
||
**Finding:** Three dashboard mini-widgets used raw `fetch()` against gated `/api/dashboard/*` endpoints, so the SPA would 401 these specific cards after Phase 6 went live.
|
||
|
||
**Files touched (landed this session):**
|
||
- `inventory/src/components/dashboard/MiniInventorySnapshot.jsx` — 3 calls migrated to `apiFetch`.
|
||
- `inventory/src/components/dashboard/MiniBusinessMetrics.jsx` — 2 calls migrated.
|
||
- `inventory/src/components/dashboard/MiniSalesChart.jsx` — 1 call migrated.
|
||
|
||
**Sweep done:** `grep -rn "fetch(\`\${config.apiUrl}\|fetch(\`/api/\|fetch('/api/\|fetch(\"/api/" inventory/src` now returns no surviving raw-fetch sites against gated paths (other than the two AuthContext exceptions and `services/apiv2.ts` already excluded by Phase F1).
|
||
|
||
**Applied / verified:** `cd inventory && npm run build` completed successfully and the live frontend build now references the new asset hash. Raw `fetch()` sweep now shows only the documented AuthContext/App `/me` exceptions, `services/apiv2.ts`, and third-party OpenWeather calls.
|
||
|
||
### 9.4 — Vitest scaffold + auth-boundary tests
|
||
|
||
**Finding:** Plan promised a `vitest` scaffold and auth-boundary tests during Phase 2; never delivered. Root `package.json` has no `scripts` block; `inventory-server/package.json` has the default `echo "Error: no test specified"`.
|
||
|
||
**Files landed this session** (under `inventory-server/`):
|
||
- `inventory-server/package.json` — adds `"test": "vitest run"`, `"test:watch": "vitest"`, plus `vitest` in `devDependencies`.
|
||
- `inventory-server/shared/auth/verify.test.js` — five cases: valid token, expired token, wrong-signature token, malformed header, missing token.
|
||
- `inventory-server/shared/auth/middleware.test.js` — request with no header → 401; bad header → 401; valid token + inactive user → 403; valid token + missing permission → 403; valid token + correct permission → next() called with `req.user` populated; cache TTL behavior (same token within 60s → one DB hit; after 61s → two).
|
||
|
||
The scaffold is intentionally narrow: it covers the security-critical surface and nothing else. Broader coverage is still a separate, larger project — see Out of scope.
|
||
|
||
**Applied / verified:** `cd /var/www/inventory && npm test` passes: 2 files, 21 auth-boundary tests.
|
||
|
||
### 9.5 — Plan-doc narrative reconciliation
|
||
|
||
**Finding:** The opening goal ("12 → 3 application servers + 1 auth-server", "ESM across all Node services") didn't match what shipped. Status row 13 overstated ESM coverage ("All 58 server-side files") in a way that hid chat-server from readers. Deliverables section line 845 acknowledged the chat-server gap but bullet 21 still claimed "all phases complete".
|
||
|
||
**Edits landed this session:**
|
||
- Headline (line 3) rewritten to "13 PM2 processes to 5 application servers + 2 auxiliary processes = 7 total" + the "primary application Node services" qualifier on ESM.
|
||
- Phase 3 status row clarified to scope ESM to inventory-server + auth-server, with explicit reference to Deviation #27.
|
||
- Goals (line 34) restated with the actual outcome + the reason for the deviation.
|
||
- Target-architecture caption corrected from `12 → 4` to `13 → 7`.
|
||
- Concrete deliverables block rewritten to flag the four items where Phase 9 closes a gap (uploads gate, chat-server auth, chat-server ESM, edge CORS).
|
||
- New status-table row added for Phase 9 itself.
|
||
|
||
**Deploy:** none — this is documentation only.
|
||
|
||
### 9.6 — Sequence + dependencies
|
||
|
||
| Step | Depends on | Risk | Notes |
|
||
|---|---|---|---|
|
||
| 9.5 plan-doc | — | none | Already done |
|
||
| 9.3 frontend rebuild | 9.5 (no functional dep, just shipping order) | low | `npm run build` from inventory/ |
|
||
| 9.4 vitest scaffold | — | none | Already done; run `npm test` to verify |
|
||
| 9.1 chat-server reload | code conversion landed this session | medium | **Done** — PM2 reload complete; unauthenticated + authenticated smokes pass |
|
||
| 9.2 Caddy reload | 9.1 (so the gate covers a service that already self-authenticates) | medium | **Done** — active Caddyfile includes Phase 9 edits; uploads/CORS smokes pass |
|
||
|
||
### 9.7 — Done criteria
|
||
|
||
Phase 9 completion checks:
|
||
- ✅ `curl -s -o /dev/null -w '%{http_code}' http://localhost:3014/test-db` returns `401` (was: 200).
|
||
- ⚠️ ~~`curl -sI https://tools.acherryontop.com/uploads/reusable/<any-real>.jpg` returns `401` without auth and `200` with valid Bearer.~~ **Superseded by Deviation #28.** Current expected behavior: `200 image/jpeg` without auth (uploads are public by design). The original Phase 9.7 criterion is preserved here as a record of what §9.2 verified at the time.
|
||
- ✅ `curl -X OPTIONS ...` to a gated `/api/*` path returns `204` from the backend with a specific `Access-Control-Allow-Origin`, **not** `*`.
|
||
- ✅ `cd /var/www/inventory && npm test` passes 100% on `shared/auth/*.test.js` (2 files, 21 tests).
|
||
- ✅ `grep -rn "require(" inventory-server/chat/*.js` returns nothing.
|
||
- ✅ Frontend build passes after the three Mini\*.jsx migrations.
|
||
|
||
---
|
||
|
||
## Testing strategy
|
||
|
||
No formal test suite exists today (per CLAUDE.md). For a refactor this size, that's a gap to close — but writing tests retroactively for 15K LOC of routes is a separate, larger project. For this refactor:
|
||
|
||
### Manual smoke testing per phase
|
||
|
||
A checklist of representative endpoints to hit after each deploy:
|
||
|
||
- `inventory-server`: `/api/products`, `/api/dashboard/overview`, `/api/analytics/revenue`, `/api/orders`, `/api/purchase-orders`, `/api/import/list-uploads`, `/api/config/global`
|
||
- `dashboard-server`: `/api/klaviyo/campaigns`, `/api/meta/insights`, `/api/google-analytics/...`, `/api/typeform/responses`
|
||
- `acot-server`: `/api/acot/...` (top-3 endpoints by call volume — pull from access logs)
|
||
- `auth-server`: `/login`, `/me`, `/verify`
|
||
|
||
Each smoke test runs (a) without a token → expect 401, (b) with an invalid token → expect 401, (c) with a valid token → expect 2xx.
|
||
|
||
### Frontend integration check
|
||
|
||
After deploys, log into the SPA and exercise each major page (Overview, Products, Analytics, Dashboard, Klaviyo, Meta, etc.). If everything loads and dashboards populate, the auth + routing layer is intact.
|
||
|
||
### Test scaffold during Phase 2 (committed)
|
||
|
||
While building `shared/`, set up `vitest` (lightweight, ESM-native, fast) as the standard test runner for the repo. Initial coverage focuses on the security-critical surface only:
|
||
|
||
- `shared/auth/verify.js` — known good token, expired token, wrong-signature token, malformed token, missing token.
|
||
- `shared/auth/middleware.js` — request with no header → 401; bad header → 401; valid token + inactive user → 403; valid token + missing permission → 403; valid token + correct permission → next() called with `req.user` populated.
|
||
- `shared/auth/middleware.js` user-cache TTL: same token within 60s → one DB hit; same token after 61s → two DB hits.
|
||
|
||
`package.json` gets a `"test": "vitest run"` script at the repo root and per-service. Set up but don't backfill broader test coverage — that's a separate, larger project. The vitest scaffold gives future work a foothold; this refactor commits to having tests for the auth boundary specifically because that's what's load-bearing for the whole security model.
|
||
|
||
---
|
||
|
||
## Rollback strategy
|
||
|
||
Each phase produces an independently deployable state. Rollback per phase:
|
||
|
||
- **Phase 1**: re-add removed services to ecosystem; restore from git. Don't roll back data deletions — only do those after a week of stable production.
|
||
- **Phases 3, 5**: ESM conversion is per-service; if one service breaks, `pm2 restart <name>` to the previous commit. Other services unaffected.
|
||
- **Phase 4**: the dashboard-server merge is the highest-risk change. Plan: deploy `dashboard-server` to a non-conflicting port (3015) while leaving the old per-vendor servers running. Cut over Caddy routes one vendor at a time (start with Meta — smallest). If any vendor breaks, point Caddy back to the old server (still running) for that vendor, debug, retry. Only delete the old servers after all four are stable on `dashboard-server`.
|
||
- **Phases 6, 7**: Caddy config is git-tracked. `git revert` + `caddy reload` rolls back in seconds. Auth changes are additive (defense in depth) — if `forward_auth` causes problems, comment it out and per-server middleware continues protecting routes.
|
||
|
||
---
|
||
|
||
## Out of scope (intentional)
|
||
|
||
These came up in the audit but aren't part of this refactor:
|
||
|
||
- `httpOnly` cookie auth ("Phase F2" — deferred). Phase F1 keeps `localStorage` + Bearer header because that's the minimum to unblock the Phase 6 `authenticate()` rollout. A future move to cookie auth would touch `AuthContext`, every backend that reads tokens, and introduce CSRF concerns — much larger project.
|
||
- Replacing PM2 with systemd or Docker.
|
||
- Test coverage beyond the auth-critical surface.
|
||
- `apiv2`/`apiv2-test` proxies to `backend.acherryontop.com` — separate system, not touched.
|
||
- `acot-phone-server` and `lt-wordlist-api` — staying as-is.
|
||
- Centralized observability stack (Prometheus, Grafana). The logger work in Phase 6.5 sets up the data, but shipping it somewhere is future work.
|
||
- ChatRoom XSS remediation (flagged during Phase 6.8 audit — `inventory/src/components/chat/ChatRoom.tsx:277,392` renders user-controlled chat content via `dangerouslySetInnerHTML` without sanitization). Real vulnerability for an internal-but-multi-user tool; separate fix.
|
||
|
||
---
|
||
|
||
## Concrete deliverables
|
||
|
||
State as of 2026-05-24: Phases 1–9 are **shipped**. The "4 application PM2 processes" original target became **5** in execution because `chat-server` stayed standalone rather than being folded in — never a serious merge candidate (different DB, different protocol shape).
|
||
|
||
- ✅ 5 application PM2 processes instead of 12 (auth-server, inventory-server, dashboard-server, acot-server, chat-server) — plus 2 unchanged (acot-phone-server, lt-wordlist-api) = 7 total.
|
||
- ✅ All `/api/*` and `/chat-api/*` requests gated at Caddy (`forward_auth`). `/uploads/*` is **intentionally public** — it was gated as part of Phase 6.7 / §9.2, then reverted because the external PHP backend at `www.acherryontop.com` fetches image URLs during product import. See Deviation #28.
|
||
- ✅ Per-upstream `authenticate()` re-verification on inventory-server, dashboard-server, acot-server, and chat-server.
|
||
- ✅ Sensitive endpoints additionally gated by per-permission checks (`requirePermission`).
|
||
- ✅ **One ESM standard across primary application Node services** — auth-server, inventory-server, dashboard-server, acot-server, and chat-server are ESM.
|
||
- ✅ One shared `lib/` at `inventory-server/shared/` for auth, logging, DB, errors, CORS.
|
||
- ✅ Login rate-limited (`shared/rate-limit/login.js`).
|
||
- ✅ `JWT_SECRET` rotated + ecosystem shadow-override removed.
|
||
- ✅ Old auth-server, Aircall, Gorgias, Clarity directories deleted from the repo. Defunct `dashboard:gorgias`/`dashboard:calls` permission rows also deleted from DB (2026-05-24).
|
||
- ✅ Caddyfile slimmed to one auth-gated block. Phase 9 §9.2 drops the edge CORS wildcard and adds a CORS preflight bypass; the `@static`-matcher tightening from §9.2 was rolled back as part of the Deviation #28 uploads-gate revert (uploads now served via a dedicated public `handle /uploads/*` block placed ahead of `@static`).
|
||
- ✅ Permission codes inserted into `permissions` table for granular authorization.
|
||
- ✅ Both gaps surfaced during Phase 5 verification — `lt-wordlist-api` insecure default token (Deviation #25) and Caddy blocking acot-phone-server's `x-acot-api-key` calls (Deviation #26) — were closed 2026-05-24.
|
||
- ✅ **Six second-look findings closed by Phase 9** (frontend fetch leaks, edge CORS wildcard, uploads-gate ordering, chat-server in-process auth + ESM, vitest scaffold, plan-doc/goal mismatch). Code and deploy steps are complete. *The uploads-gate fix was later reverted (Deviation #28) — the gate was incompatible with the external-server image-fetch contract; what `@static` slip-through actually revealed was that uploads belong in front of the gate, not behind it.*
|
||
|
||
---
|
||
|
||
## Deviations from original plan (recorded during execution)
|
||
|
||
These are decisions made during Phase 1/2 implementation that amend the spec above. Future phases should follow the deviated path, not the original sketch.
|
||
|
||
1. **`shared/` location.** Original plan placed `shared/` at the repo root as a sibling of `inventory/` and `acot-phone/`. Implemented at `inventory-server/shared/` (= `/var/www/inventory/shared/` on the server) instead. Reason: the actual project root *is* `/var/www/inventory/`; placing shared/ outside it would have meant building a deployment story for it that doesn't exist. Import paths change accordingly:
|
||
- From `inventory-server/{auth,src,chat}/server.js` → `../shared/...`
|
||
- From `inventory-server/dashboard/{vendor}-server/server.js` → `../../shared/...`
|
||
|
||
2. **`/verify` response headers.** Plan specified `X-User-Id` + `X-User-Is-Admin`. Implemented as `X-User-Id` + `X-User-Username` (both available from the JWT payload). `X-User-Is-Admin` was dropped because `is_admin` isn't in the JWT today and returning it would require a DB lookup — violating the "no DB hit" principle. To restore `X-User-Is-Admin`, enrich the JWT payload at login time (one-line change in `auth/routes.js`) during Phase 6, then echo from `/verify`. Upstreams don't trust these headers anyway (they re-verify), so the omission is informational, not security-relevant.
|
||
|
||
3. **User cache key in `shared/auth/middleware.js`.** Plan sketch mentioned "60s TTL keyed by token jti". Implemented as keyed by `userId` instead — the JWT doesn't currently include a `jti` claim, and the cache's invalidation semantics are "this user was deactivated/changed permissions" (per-user), not "this token was revoked" (per-token). The plan's pseudocode already used `loadUserCached(pool, decoded.userId)` so this matches the spirit.
|
||
|
||
4. **Redis client safety.** `shared/db/redis.js` sets `enableOfflineQueue: false` and `lazyConnect: true`. Plan didn't specify but these defaults mean a Redis hiccup fails fast (route fall-through to upstream API as designed in Phase 4 risk notes) rather than queueing commands indefinitely.
|
||
|
||
5. **CORS allowed origins kept `https://acot.site`.** Plan example listed three origins; production has acot.site as a redirect to tools.acherryontop.com but also reaches the API directly in some flows. Kept it to avoid breakage. LAN wildcards (`192.168.*`, `10.*`) and `Access-Control-Allow-Origin "*"` are NOT included in the new `shared/cors/policy.js` per the plan's Phase 6.6 spirit, but the legacy `inventory-server/src/middleware/cors.js` still has them until services are migrated to consume `shared/cors/`.
|
||
|
||
6. **Defunct permission codes cleaned up (2026-05-24).** Removed the `dashboard:gorgias` and `dashboard:calls` Protected blocks from the frontend; the corresponding permission rows in the `permissions` table (and their user_permission grants) were deleted in a follow-up migration alongside the Phase 6.2 permissions seed. Verified post-migration: `permissions` table contains only the in-use `dashboard:*` codes (analytics, campaigns, feed, financial, meta_campaigns, operations, payroll, products, realtime, sales, stats, typeform, user_behavior).
|
||
|
||
7. **PM2 process names retained `new-auth-server` (not `auth-server`).** Plan's Phase 8 final form names it `auth-server` (after the legacy 3003 one is removed). Decided to keep the existing `new-auth-server` name through Phase 2 to avoid a rename mid-stream. Phase 8 can rename if desired, but it's cosmetic — all wiring is by port (3011) not name.
|
||
|
||
8. **Caddyfile changes via admin API on `:2020`.** The Caddyfile is owned by root and matt has no passwordless sudo. Cutover used `curl -X POST .../load` on the Caddy admin port (which matt can hit), then a separate `sudo cp /home/matt/Caddyfile.new /etc/caddy/Caddyfile` step to persist the on-disk file. Future Caddyfile changes can follow the same pattern. Backup convention: `/etc/caddy/Caddyfile.bak.YYYY-MM-DD`.
|
||
|
||
9. **Path-naming.** Plan uses `inventory/` as the top-level (server-side path convention). Locally the equivalent is `inventory-server/`. Whenever the plan says `inventory/dashboard/foo/`, read that as `/var/www/inventory/dashboard/foo/` on the server or `inventory-server/dashboard/foo/` locally.
|
||
|
||
10. **Scripts directory kept CJS via package.json shim.** Original plan called for converting "any spawned script" to ESM alongside its caller. Implemented: added `inventory-server/scripts/package.json` with `"type": "commonjs"`. Node's package-type resolution walks up directory by directory, so this overrides the parent's `"type": "module"` for the entire `scripts/` tree (≈15 files including `import/*.js`, `metrics-new/utils/*`, the orchestrator scripts) without renaming any file or touching any `spawn()` callsite. Convert individual scripts to ESM when touched; don't bulk-migrate.
|
||
|
||
11. **`src/routes/products.js` had dead multer setup.** Phase 6.7 spec called for hardening the upload route in products.js. There was no upload route — the `multer({ dest })` instance and `importProductsFromCSV` import were dead code left over from a long-ago migration. Strongest 6.7 hardening was deletion: no upload handler = no attack surface. The two real upload paths (`/api/import/upload-image` and `/api/reusable-images/upload`) got tightened MIME+extension allowlists instead.
|
||
|
||
12. **Two pre-existing syntax errors in shared/db/ surfaced.** `shared/db/pg.js:13` and `shared/db/redis.js:22` both had `?? Number(...) || N` — mixing `??` and `||` without parentheses is a TC39 syntax error. They passed Phase 2 because nothing imported them yet; Phase 3 smoke-test exposed it. Fixed with parens.
|
||
|
||
13. **`import { Pool } from 'pg'` doesn't work in ESM.** The `pg` package is CJS using `module.exports = { Pool, ... }`. Node's ESM-from-CJS interop fails to detect `Pool` as a named export via static analysis. The bulletproof pattern, now used everywhere: `import pg from 'pg'; const { Pool } = pg;`. Same idea for any future CJS-only deps. `src/utils/db.js` already had it; the two auth files needed the fix during execution.
|
||
|
||
14. **Frontend Bearer-header gap discovered (drives new Phase F1).** Phase 6 was specified assuming the frontend already sends `Authorization: Bearer` on every API call. It does not — only 7 of ~220 call sites do. Phase 6's `authenticate()` middleware is shipped and ready to enable, but until F1 lands the SPA will 401 on every page. The plan now has Phase F1 to address this explicitly; until then, the Phase 3+6 pm2 reload should not ship unless F1 ships in the same window.
|
||
|
||
15. **macOS NFS workflow note.** The `inventory-server/` directory locally is an NFS mount of `/var/www/inventory/` on netcup. Bulk operations (`find`/`grep -r`/mass `node --check`/`npm install`) hang or take minutes locally and pollute file listings with macOS AppleDouble `._*` sidecar files. Default to `ssh netcup` for any sweep across the tree — individual file edits via the editor are fine.
|
||
|
||
16. **dashboard-server lives at `inventory-server/dashboard/` (not its own top-level dir).** Plan's Phase 4 diagram implied a sibling of `inventory-server/` etc. The merged service lives at `inventory-server/dashboard/server.js` with `package.json` declaring `"type": "module"`. Per-vendor subdirectories (`klaviyo-server/`, `meta-server/`, `google-server/`, `typeform-server/`) each have their own `package.json` so Node's "nearest parent package.json" walk stops there — they are unaffected by the new parent type. Added `"type": "commonjs"` defensively to meta/google/typeform package.json so a future deletion of their files (cutover cleanup) plus a stray `*.js` left under `dashboard/` wouldn't accidentally try to ESM-parse it.
|
||
|
||
17. **Klaviyo `RedisService` kept as a wrapper, but accepts injected client.** Plan said "replace each server's per-instance pool/redis with the injected one." The Klaviyo codebase has ~3K LOC of service code (`events.service.js` alone is 2.2K) that calls `this.redisService._getCacheKey()`, `.get()`, `.set()`, `.getEventData()`, `.clearCache()`, `._getTTL()`. Rewriting all of that to call ioredis directly would risk breaking the cache-key/TTL invariants. Decision: keep `services/klaviyo/redis.service.js` as a thin facade with the same public surface, but its constructor now takes an ioredis client instead of constructing one. The 3 service classes (`EventsService`, `CampaignsService`, `ReportingService`) all take `(apiKey, apiRevision, redis)` and pass `redis` to `new RedisService(redis)`. `MetricsService` doesn't use Redis — left unchanged.
|
||
|
||
18. **dashboard-server `.env` layering.** Plan called for "Single `.env` at `inventory/dashboard/.env`, prefixed keys: KLAVIYO_*, META_*, ... JWT_SECRET ... # shared with auth-server." Implemented as two-file layering: `server.js` loads `/var/www/inventory/.env` FIRST (provides JWT_SECRET, DB_*, REDIS_*) then `inventory-server/dashboard/.env` SECOND for vendor-specific keys (KLAVIYO_API_KEY, META_*, GA_*, TYPEFORM_*). dotenv defaults to `override:false`, so the first file wins on collisions — security-critical vars live in one place, vendor keys in the other. `.env.example` template committed at `dashboard/.env.example`. **Pre-cutover step**: copy the vendor keys from the current per-vendor `.env` files into either of those two files before `pm2 reload`, else KLAVIYO_API_KEY etc. will not be set and routes will 500.
|
||
|
||
19. **Caddyfile typo fixed: `/api/google-analytics` → `/api/dashboard-analytics`.** The pre-Phase-4 `Caddyfile.proposed` listed a `handle /api/google-analytics/*` block. The live Caddyfile and the frontend (`inventory/src/config/dashboard.ts`) both use `/api/dashboard-analytics/*` (the live file has a `uri replace /api/dashboard-analytics /api/analytics` rewrite to land on google-server's `/api/analytics` mount). The merged dashboard-server now mounts the Google router at `/api/dashboard-analytics` directly — Caddy no longer needs the rewrite, just a straight reverse_proxy. Fixed in `deploy/Caddyfile.proposed`.
|
||
|
||
20. **`metrics.routes.js` had a latent router-scope bug.** The Klaviyo `metrics.routes.js` declared `const router = express.Router()` at MODULE scope (outside the `createMetricsRouter` factory), so calling the factory twice would have re-mounted handlers on the same router (cumulative). Benign for a single-mount PM2 service, but fixed during the Phase 4 copy — the router now lives inside the factory. Also renamed the export from `createMetricsRoutes` (plural) to `createMetricsRouter` (matches the convention used by every other vendor's index.js).
|
||
|
||
21. **PM2 log paths use per-server `logs/pm2/` (NOT `/var/log/pm2/`).** Discovered during the first apply attempt: the previously-shipped `ecosystem.config.cjs.proposed` carried over `/var/log/pm2/...` from the live file, but matt has no write perms on `/var/log` (root:syslog 775) so the entries silently failed to launch (chat-server + acot-server came up because they had no explicit log path; new-auth-server, inventory-server, dashboard-server bailed). The actual convention — already in place via pre-created folders on disk — is per-service `logs/pm2/` directly under each service's directory (`./inventory/auth/logs/pm2/`, `./inventory/chat/logs/pm2/`, `./inventory/dashboard/acot-server/logs/pm2/`, `./inventory/dashboard/logs/pm2/` for the merged dashboard-server, `./inventory/logs/pm2/` for inventory-server, `/opt/lt-wordlist-api/logs/pm2/`, `/var/www/acot-phone/logs/pm2/`). All folders are matt:matt. `pm2-logrotate` (already loaded in matt's daemon) rotates them in place.
|
||
|
||
22. **All PM2 apps run under matt's single daemon — no root daemon.** The earlier `OUT OF SCOPE` comment block in the proposed ecosystem incorrectly claimed `lt-wordlist-api` and `acot-phone-server` were managed by a separate root PM2 daemon. They are not — matt's daemon manages everything. Removed the bogus block; both apps are now first-class entries in the proposed ecosystem with corrected script paths:
|
||
- `lt-wordlist-api` script is `/opt/lt-wordlist-api/index.js` (was `/opt/lt-wordlist-api/server.js` in the live file — wrong; that file doesn't exist). `/opt/lt-wordlist-api` is matt:matt 0750.
|
||
- `acot-phone-server` script is `/var/www/acot-phone/dist/server.js` (was `./inventory/acot-phone/server.js` in the live file — wrong; that path doesn't exist). `/var/www/acot-phone/` is matt:matt with its own `.env` and is a separate repo from inventory-server.
|
||
|
||
23. **Phase 6.10 ADD_WORD_TOKEN move stays in this ecosystem.** Per Deviation #22, `lt-wordlist-api` is in matt's ecosystem, so the §6.10 work to remove inline `ADD_WORD_TOKEN` and load it from `/opt/lt-wordlist-api/.env` instead is implemented directly in `deploy/ecosystem.config.cjs.proposed` (no inline `ADD_WORD_TOKEN`; script reads its own .env). When applying, rotate the token value in `/opt/lt-wordlist-api/.env` and update any callers.
|
||
|
||
24. **Phase 6 patterns applied to acot-server during Phase 5.** acot-server was originally planned to convert mechanically (require → import) and inherit Phase 6 hardening later. Done in a single pass instead: the new `server.js` mounts `shared/logging/request-log.js`, `shared/cors/policy.js`, `shared/errors/handler.js`, and `shared/auth/middleware.js`'s `authenticate()` on `/api/acot/*` (except the customers router — see Phase 5 auth-model section above). Adds `pg` dependency to `inventory-server/dashboard/acot-server/package.json` because the Postgres pool for `authenticate()`'s user/permission lookups is initialized in-process. Env layering follows dashboard-server's pattern: `/var/www/inventory/.env` loaded first (JWT_SECRET, DB_*), local `.env` loaded second (PROD_DB_*, PROD_SSH_*, ACOT_PHONE_API_KEY). No `acot_admin` permission gates wired — none of the routes mutate state in ways that warrant per-permission checks today; the seeded code in `migrations/005_phase6_permission_codes.sql` stays reserved.
|
||
|
||
25. **Phase 6.10 fixed 2026-05-24 via option (b).** Discovered during Phase 5 closeout that `/opt/lt-wordlist-api/.env` didn't exist, no `ADD_WORD_TOKEN` env var was set on the running process, and the script's `process.env.ADD_WORD_TOKEN || 'tokenhere'` fallback was the only gate — meaning `curl -X POST -H "x-add-word-token: tokenhere" http://localhost:3030/add-word` succeeded in production.
|
||
|
||
**Fix applied:**
|
||
- Generated a fresh 32-byte hex token via `openssl rand -hex 32`.
|
||
- Wrote it to `/opt/lt-wordlist-api/.env` (matt:matt, mode 0600).
|
||
- Edited `/var/www/ecosystem.config.cjs`'s `lt-wordlist-api` entry to add `node_args: ['--env-file=/opt/lt-wordlist-api/.env']`. Node ≥20.6 (netcup runs v22.22.2) reads the file at startup with no script changes and no `dotenv` import — the cleanest of the three options because the token never lives in the committed ecosystem file.
|
||
- `pm2 reload /var/www/ecosystem.config.cjs --only lt-wordlist-api --update-env` picked up the new wrapper config. PM2 restart count 1 → 2, clean startup.
|
||
|
||
**Verified:** old default `tokenhere` now returns `{"error":"unauthorized"}` HTTP 401; new env-file token returns `{"ok":true,...}` HTTP 200 on `/add-word` and `/delete-word`. To rotate again: edit `/opt/lt-wordlist-api/.env` + `pm2 restart lt-wordlist-api --update-env`.
|
||
|
||
**Caller coordination:** user confirmed all callers are external and will be updated as issues surface; no inventory of callers to pre-notify.
|
||
|
||
26. **Caddy `forward_auth` gate breaks acot-phone-server's customer lookups — fixed 2026-05-24 via option (a).** Phase 6.1 put `/api/acot/*` behind `forward_auth localhost:3011/verify`, which strictly requires a JWT Bearer token. But `acot-phone-server` (at `/var/www/acot-phone/`) calls `/api/acot/customers/by-phone`, `/api/acot/customers/search`, `/api/acot/customers/:cid/orders` using only an `x-acot-api-key` header (`/var/www/acot-phone/src/services/acotApi.ts:51`). Result: every customer lookup from the phone app hit a 401 at the Caddy gate before reaching `requirePhoneApiKey`. Last successful customer call in acot-server's access log was 2026-05-21 — three days before the Caddy cutover.
|
||
|
||
**Fix applied — option (a):** changed `ACOT_API_URL` in `/var/www/acot-phone/.env` (and `acot-phone-server/.env.example` and the local repo copy) from `https://tools.acherryontop.com/api/acot` to `http://localhost:3012/api/acot`. Both processes live on netcup, so the request never enters Caddy and lands directly on `requirePhoneApiKey` in-process. Restarted via `pm2 restart acot-phone-server --update-env`; smoke-tested with `curl -H "x-acot-api-key: ..." http://localhost:3012/api/acot/customers/by-phone?phone=...` → 200 with the real customer record.
|
||
|
||
(Alternative option (b), kept here for posterity: add a `@phone_auth header x-acot-api-key *` guard in the Caddyfile to bypass `forward_auth` for requests bearing the shared secret. Would have worked too but introduces a header-based bypass in the gate, which is a worse security posture than just not entering Caddy at all.)
|
||
|
||
27. **chat-server stayed CJS through Phase 8, converted to ESM in Phase 9 (2026-05-24).** Why it stayed CJS originally: during Phase 4 scoping, chat-server was a notional merge candidate (fold its routes into `dashboard-server`) but rejected on inspection — it speaks a different DB (`rocketchat_converted`), a different protocol shape (file proxying + message search, not vendor-API aggregation), and shares no service code with the dashboard merge surface. Once "no merge" was decided, the secondary case for converting it (consistency, sharing `shared/auth/middleware.js`) wasn't strong enough to make it worth the Phase 4 testing budget. Phase 6 hardening also skipped it because the Caddy gate was assumed sufficient.
|
||
|
||
The second-look audit (2026-05-24) flagged two consequences:
|
||
- The plan's literal claim of "ESM across all Node services" was false.
|
||
- chat-server had only one defense layer (Caddy `forward_auth`) — `curl http://localhost:3014/test-db` returned 200 unauthenticated.
|
||
|
||
Phase 9 §9.1 closes both: chat-server is now ESM, mounts `authenticate()` against an in-process `inventory_db` pool (separate from the existing `rocketchat_converted` pool which routes.js uses via `global.pool`), and binds to `127.0.0.1` so external direct-port access is no longer possible even if the host firewall rule were dropped. Conversion was 3 files (`server.js`, `routes.js`, `package.json`) and minimal because routes.js's only top-level imports were `express` and `path` — handler bodies were untouched.
|
||
|
||
28. **Phase 6.7 / §9.2 `/uploads/*` gating reverted (2026-05-24).** Phase 6.7 specified moving `/uploads/*` behind `forward_auth`, and Phase 9 §9.2 hardened that by excluding `/uploads/*` from the `@static` matcher so uploaded `*.jpg` paths couldn't slip past the gate via the SPA build root. The SPA was patched (`AuthedImage` in `inventory/src/components/ui/authed-image.tsx`) to send Bearer headers on `/uploads/*` GETs.
|
||
|
||
**The breakage:** Both `/uploads/products/*` and `/uploads/reusable/*` URLs are submitted as strings inside the `product_images` field to `/apiv2/product/setup_new`, which Caddy reverse-proxies to the external PHP backend at `www.acherryontop.com`. That backend then **fetches the image URLs server-to-server** to ingest them into the shop. The external backend has no JWT and never authenticates against our `localhost:3011/verify` endpoint — so after the gate went live, every external import-time image fetch hit 401 and the product-import flow silently dropped images.
|
||
|
||
The original Phase 6.7 / F1 notes reserved this exact escape hatch (CONSOLIDATION_PLAN.md "Note on `/uploads/*` gating": *"if broken, revert this part of 6.7 to keep `/uploads/*` public, OR issue per-image signed URLs from the API"*).
|
||
|
||
**Fix applied — option (a):** revert the gate. In the live Caddyfile:
|
||
- Removed `/uploads/*` from the `@gated` matcher.
|
||
- Removed the `not path /uploads/*` exclusion from `@static` (no longer needed).
|
||
- Added a dedicated public `handle /uploads/*` rooted at `/var/www/inventory` (NOT the SPA build root — paths must resolve to the real uploads directory). Placed ahead of both `@static` and `@gated` so it wins the match.
|
||
|
||
Staged via `inventory-server/deploy/Caddyfile.proposed` and applied via the Deviation #8 pattern (admin-API `curl -X POST :2020/load` + `sudo cp` to persist).
|
||
|
||
**Verified live:**
|
||
- `curl -sI https://tools.acherryontop.com/uploads/reusable/<real>.jpg` → `200 image/jpeg` without auth.
|
||
- `curl -sI https://tools.acherryontop.com/uploads/products/<nonexistent>.jpg` → `404` (not `401`) — confirms gate is no longer in front of `/uploads/*`.
|
||
- `curl -sI https://tools.acherryontop.com/api/products` (no token) → `401` — `@gated` is still live for `/api/*` + `/chat-api/*`.
|
||
|
||
**Rationale for staying public rather than signing URLs:** Per-image HMAC signed URLs would have worked too but require a small change on the inventory server (URL generation + signature validation in Caddy or upstream) and would flow through the PHP backend untouched. Plain public access is acceptable because: (a) uploaded image URLs use random unguessable filenames (`createUploadFilename` in `inventory-server/src/routes/import.js` — timestamp + 9-digit `Math.random` suffix), (b) `/uploads/products/*` files auto-delete after 24h via `scheduleImageDeletion`, and (c) no PII or confidential data lives in this bucket today. If that ever changes, implement signed URLs rather than re-gating wholesale.
|
||
|
||
**AuthedImage:** Left in place as defense-in-depth. Bearer header is harmless when the path isn't gated, so existing call sites keep working unchanged. No rush to rip it out.
|
||
|
||
**Doc updates this session:** the Status table's Phase 9 row, Phase 6.1 row, Phase 6.7 row, the inline 6.7 bullet, the F1 "Note on `/uploads/*` gating" subsection, the Phase 7 Caddyfile diagram, the Phase 9 §9.2 narrative + smoke results, and the Concrete deliverables checklist were all updated to reflect the revert.
|