Files

T

2026-05-23 19:38:12 -04:00

56 KiB

Raw Blame History

Server Consolidation & Security Hardening Plan

Audit-driven plan to (a) reduce 12 PM2 processes to 3 application servers + 1 auth server, (b) put every API endpoint behind real authentication, and (c) standardize on ESM across all Node services. Approach is "do it properly the first time" — no half-finished pieces, no deferred cleanup.

Status (2026-05-23)

Phase	Status	Notes
1 — Decommission dead services	Complete	aircall/gorgias/clarity/legacy-auth-server deleted from repo + PM2 + Caddyfile + ecosystem.cjs
2 — Build shared `lib/`	Complete	Lives at `inventory-server/shared/` (see Deviations). `/verify` endpoint live on auth-server
3 — Convert auth-server + inventory-server to ESM	Complete (code)	All 58 server-side files ESM; verified 0 import failures on netcup. Pending: `npm install` on server + pm2 reload to actually run the new code. See Deviations #10–13
4 — Build `dashboard-server` (the merge)	Not started	klaviyo/meta/google/typeform still run as 4 separate PM2 apps
5 — Convert `acot-server` to ESM	Not started
6 — Auth hardening	Complete (code) — gated on Phase F1	All in-process items wired (rate-limit, JWT precondition, CORS lockdown, request-log, upload allowlist, `requirePermission` on sensitive routes, permissions seed migration). `authenticate()` is live on `/api/`. Server-side artefacts (Caddyfile, ecosystem.cjs) written to `inventory-server/deploy/` for review. 6.11 (audit logging) deferred. Frontend cannot use the app until Phase F1 ships* — see below
F1 — Frontend fetch wrapper (NEW)	Not started — CRITICAL	Frontend uses raw `fetch()` in ~220 sites; only 7 send `Authorization: Bearer`. With Phase 6's `authenticate()` middleware live, every refresh 401s until the frontend uniformly attaches the token. See "Phase F1" below
7 — Caddyfile final form	Partial	Proposed file at `inventory-server/deploy/Caddyfile.proposed`. Apply blocked on F1 (forward_auth would 401 every page load until then)
8 — ecosystem.config.cjs final form	Partial	Proposed at `inventory-server/deploy/ecosystem.config.cjs.proposed`. Includes Phase 6.4 JWT_SECRET footgun fix and 6.10 lt-wordlist token move

Live PM2 count: 10 (down from 13). Target after Phase 4: 5 application apps + acot-phone-server + lt-wordlist-api.

Apply order from current state: (a) npm install on netcup to install the new shared-module deps (pino, pino-http, ioredis, express-rate-limit, jsonwebtoken), (b) ship Phase F1 frontend fetch wrapper, (c) pm2 reload inventory-server new-auth-server (Phase 3+6 code goes live, requests carry tokens, app keeps working), (d) apply deploy/ecosystem.config.cjs.proposed (Phase 6.4 + 6.10), (e) apply deploy/Caddyfile.proposed (Phase 6.1 — edge gate).

Goals

Every public-facing endpoint requires a valid auth token (Caddy gate + per-server middleware + per-route permission checks for sensitive operations).
Reduce service count from 12 PM2 processes to 4: inventory-server, acot-server, dashboard-server, auth-server.
Standardize on ESM ("type": "module") across all Node services.
Decommission aircall-server, gorgias-server, clarity-server, and the legacy auth-server (port 3003).
Eliminate dependency duplication: one Redis client, one Postgres pool helper, one logger, one auth middleware — shared across services.

Non-goals

Rewriting business logic. Route handlers move as-is unless they break under ESM or shared middleware.
Switching auth providers (we keep JWT + bcrypt + Postgres).
Replacing PM2 or Caddy.
Migrating Klaviyo/Meta/Google/Typeform's external API contracts.

Target architecture

                                  ┌──────────────────────────┐
                                  │  tools.acherryontop.com  │
                                  │         (Caddy)          │
                                  │   forward_auth gate ─────┼──► auth-server:3011
                                  └────────────┬─────────────┘     /verify endpoint
                                               │
              ┌────────────────────────────────┼────────────────────────────────┐
              ▼                                ▼                                ▼
   ┌─────────────────────┐         ┌──────────────────────┐         ┌─────────────────────┐
   │  inventory-server   │         │   dashboard-server   │         │     acot-server     │
   │      :3010 (ESM)    │         │      :3015 (ESM)     │         │     :3012 (ESM)     │
   │                     │         │                      │         │                     │
   │  /api/products      │         │  /api/klaviyo/*      │         │  /api/acot/*        │
   │  /api/orders        │         │  /api/meta/*         │         │  (MySQL via SSH)    │
   │  /api/analytics     │         │  /api/google-*/*     │         │                     │
   │  /api/dashboard     │         │  /api/typeform/*     │         │                     │
   │  ... (~25 routers)  │         │                      │         │                     │
   └─────────────────────┘         └──────────────────────┘         └─────────────────────┘
              │                                │                                │
              ├── Postgres (inventory_db)      ├── Postgres (klaviyo)           └── MySQL (workpi, via ssh2 tunnel)
              ├── shared lib/ ◄────────────────┤                                
              │   - auth middleware            ├── Redis (shared client)
              │   - permission helper          └── shared lib/ ◄─────────────────┐
              │   - logger                                                       │
              │   - pg pool factory                                              │
              │   - error formatter                                              │
              └─────────────────────────────────────────────────────────────────┘
                                                                                 │
                                                              ┌──────────────────┴───┐
                                                              │     auth-server      │
                                                              │      :3011 (ESM)     │
                                                              │   /login, /me,       │
                                                              │   /verify, user mgmt │
                                                              └──────────────────────┘

PM2 process count: 12 → 4 (plus acot-phone-server and lt-wordlist-api, which stay as-is — out of scope).

Phase 1 — Decommission dead/leaving services

Status: Complete (2026-05-23). All four services removed from repo, PM2, Caddyfile, and ecosystem.config.cjs. Frontend widgets (AircallDashboard.jsx, GorgiasOverview.jsx) and their dashboard.ts/Navigation.jsx/vite.config.ts wiring also removed. Verification: smoke-tested https://tools.acherryontop.com/api/{aircall,gorgias,clarity}/* → 404. Backups left at /home/matt/{ecosystem.config.cjs,Caddyfile}.bak.2026-05-23.

To remove

Service	Reason	Steps
`aircall-server` (3002)	Migrating off Aircall	`pm2 delete aircall-server`; remove from `ecosystem.config.cjs`; remove `/api/aircall/*` from Caddyfile; drop `inventory/dashboard/aircall-server/` directory; remove MongoDB connection from any frontend code; cancel Mongo if it was only feeding Aircall
`gorgias-server` (3006)	Migrating off Gorgias	same pattern; check frontend for `/api/gorgias/*` callers and delete the dashboards/widgets that use them
`clarity-server` (3009)	Already dead (no `.js` files, not in ecosystem)	remove `/api/clarity/*` from Caddyfile; delete `inventory/dashboard/clarity-server/` directory
`auth-server` (3003, legacy)	Replaced by `new-auth-server` on 3011	grep entire codebase for `dashboard-auth` and `localhost:3003`; redirect or remove callers; `pm2 delete auth-server`; remove from ecosystem; remove `/dashboard-auth/*` from Caddyfile; delete `inventory/dashboard/auth-server/` directory

Verification before deletion

# from inventory/ root — find any references before removing
grep -rn "aircall\|/api/aircall" inventory/src/ inventory-server/src/
grep -rn "gorgias\|/api/gorgias" inventory/src/ inventory-server/src/
grep -rn "/dashboard-auth\|localhost:3003" inventory/src/ inventory-server/src/
grep -rn "/api/clarity" inventory/src/ inventory-server/src/

Any remaining callers must be deleted or repointed before the server is removed. Do not leave a 502 response in production.

Database/secret cleanup

Drop the MongoDB instance feeding Aircall (after confirming no other consumers).
Rotate any Gorgias/Aircall API keys still in .env files (defense in depth — they'll be useless soon anyway, but commit hygiene matters).
Remove MONGODB_URI, AIRCALL_*, GORGIAS_* from any .env files.

Phase 2 — Build the shared `lib/`

Status: Complete (2026-05-23). All 11 modules written under inventory-server/shared/ (NOT repo root — see Deviations). /verify endpoint added to auth-server in CJS form (will move to shared/auth/verify.js usage during Phase 3 ESM conversion). Smoke-tested with no-token / bad-token / expired-token / valid-token cases. No service consumes shared/ yet; that happens in Phases 3–5.

Location

A single shared directory at the repo root: shared/ (sibling of inventory/ and acot-phone/). Each service imports from it via a relative path. We do not introduce npm workspaces yet — relative imports are fine for three consumers and avoid the npm-link / hoisting headaches.

Modules to create

shared/
├── package.json          # "type": "module"
├── auth/
│   ├── middleware.js     # authenticate(), requirePermission(), requireAdmin()
│   └── verify.js         # verifyToken() — pure function, no Express dependency
├── db/
│   ├── pg.js             # createPool(envPrefix) — returns configured Pool
│   └── redis.js          # createRedis() — single client, lazy-connect
├── logging/
│   ├── logger.js         # pino-based, redacts Authorization/Cookie
│   └── request-log.js    # Express middleware, structured access log
├── errors/
│   └── handler.js        # consistent error envelope, no leak in prod
├── cors/
│   └── policy.js         # single allowed-origins list, exported as cors() options
└── rate-limit/
    └── login.js          # express-rate-limit config for /login

Auth middleware spec (`shared/auth/middleware.js`)

// Pseudocode — final implementation matches the existing pattern in
// inventory/auth/routes.js authenticate() but factored out.

export function authenticate({ pool }) {
  return async (req, res, next) => {
    const header = req.headers.authorization;
    if (!header?.startsWith('Bearer ')) {
      return res.status(401).json({ error: 'Authentication required' });
    }
    try {
      const decoded = jwt.verify(header.slice(7), process.env.JWT_SECRET);
      // Short-circuit DB hit with an in-memory cache, 60s TTL keyed by token jti
      const user = await loadUserCached(pool, decoded.userId);
      if (!user.is_active) return res.status(403).json({ error: 'Account inactive' });
      req.user = user;
      next();
    } catch {
      res.status(401).json({ error: 'Invalid token' });
    }
  };
}

export function requirePermission(code) {
  return (req, res, next) => {
    if (req.user.is_admin) return next();
    if (req.user.permissions?.includes(code)) return next();
    res.status(403).json({ error: 'Insufficient permissions' });
  };
}

export const requireAdmin = (req, res, next) =>
  req.user.is_admin ? next() : res.status(403).json({ error: 'Admin only' });

Why a 60s in-memory user cache

forward_auth in Caddy will call auth-server on every request. Each per-server authenticate() middleware also has a DB lookup to load permissions. Without caching, every API request becomes 1 SQL query for the user row + 1 for permissions. 60s TTL is short enough that deactivating a user takes effect within a minute, long enough that Klaviyo dashboards (which fire dozens of requests on load) don't hammer Postgres.

Add to `auth-server`: a `/verify` endpoint

Caddy's forward_auth only needs "is this token valid? give me a user-id." Today's /me does that but with a full permissions join. Add a lightweight /verify that:

Verifies JWT signature only (no DB hit).
Returns 200 with X-User-Id and X-User-Is-Admin response headers (which Caddy copy_headers will pass to upstream).
Returns 401 on bad token.

Decision: each service re-verifies the JWT independently. Caddy's forward_auth is a fast first-pass reject for obviously bad tokens, but the security boundary is the per-server authenticate() middleware. Cost is negligible (one HMAC-SHA256 per request); the upside is that a misconfigured Caddyfile can never let an unauthenticated request reach a backend. Upstream services do not trust any X-User-* headers from Caddy — they parse the Authorization header themselves.

Phase 3 — Convert `auth-server` and `inventory-server` to ESM

Status: Complete (code) — 2026-05-23. Both servers + all sub-trees converted to ESM. 58 importable .js files load cleanly on netcup (verified via dynamic-import sweep). Two latent bugs surfaced and fixed: ??/|| precedence in shared/db/{pg,redis}.js, and CJS named-import of Pool from pg in both auth files (now uses import pg from 'pg'; const { Pool } = pg).

Scripts under inventory-server/scripts/ (one-shot maintenance / orchestrators) kept CommonJS via a sibling scripts/package.json declaring "type": "commonjs" — Node's package-type resolution walks up directory by directory, so this overrides the parent's "type": "module" without renaming any file or touching any spawn() callsite. Convert individual scripts to ESM if/when touched.

Pending to actually go live: npm install on netcup (new deps: pino, pino-http, ioredis, express-rate-limit, jsonwebtoken) + pm2 reload. See "Phase F1" — the frontend fetch wrapper should ship in the same deploy or this immediately breaks the app.

Mechanical conversion

Per service:

Add "type": "module" to package.json.
Convert require() → import. module.exports → export / export default.
Fix __dirname/__filename (use import.meta.url + fileURLToPath).
Convert any dynamic require (e.g., conditional plugin loading) to await import().
Update any sub-imports that don't include the file extension — ESM requires ./foo.js, not ./foo.
Update ecosystem.config.cjs if any service entry depended on CJS semantics. The ecosystem file itself can stay .cjs — PM2 reads it as config, doesn't matter what the apps it spawns are.
Update nodemon config / scripts.

Risk areas in inventory-server

routes/ai.js does a lazy init (aiRouter.initInBackground() called from server.js) — confirm the export shape still works as a default export of an Express router with a sidecar function. May need to split into export default router; export function initInBackground() {}.
Multer setup in routes/import.js — straightforward, no ESM-specific concerns.
SSE setup in server.js — moves over cleanly, no module-system entanglement.
The child_process.spawn calls for metrics calculation: ESM doesn't change child_process behavior, but if any spawned script uses require() of a sibling, that sibling must also be ESM (or stay CJS with a .cjs extension).

Test strategy

After conversion, pm2 start ecosystem.config.cjs --only inventory-server on the server, watch logs for require/import errors at startup.
Hit /health, then the most exercised endpoints (/api/products, /api/dashboard/overview, /api/analytics/...). If startup is clean and three smoke endpoints work, ESM conversion is done. Functional correctness is preserved because no logic changed.

Auth-server

Already small (~200 LOC server.js + ~few hundred in routes.js + permissions.js). 1-day conversion. Add the new /verify endpoint as part of this work.

Phase 4 — Build `dashboard-server` (the merge)

Status: Not started. The big merge. Klaviyo + Meta + Google + Typeform → one ESM service. Highest-risk phase — see Rollback strategy for the per-vendor cutover plan.

Layout

inventory/dashboard/
├── server.js                  # entry: load env, init Postgres+Redis, mount routes, listen
├── package.json               # "type": "module", deps from all 4 source servers (deduped)
├── .env                       # KLAVIYO_*, META_*, GOOGLE_*, TYPEFORM_*, shared DB_*, REDIS_URL
├── routes/
│   ├── klaviyo/               # absorbed from dashboard/klaviyo-server/src/
│   ├── meta/                  # absorbed from dashboard/meta-server/
│   ├── google/                # absorbed from dashboard/google-server/
│   └── typeform/              # absorbed from dashboard/typeform-server/
├── services/                  # per-vendor API clients (Klaviyo SDK calls, etc.)
├── scripts/
│   └── import-campaign-products.js  # one-shot, moved from klaviyo-server/scripts/
└── logs/

Mount points

// server.js (sketch)
import { authenticate, requirePermission } from '../../shared/auth/middleware.js';
import { createPool } from '../../shared/db/pg.js';
import { createRedis } from '../../shared/db/redis.js';
import { logger, requestLog } from '../../shared/logging/index.js';
import corsPolicy from '../../shared/cors/policy.js';
import errorHandler from '../../shared/errors/handler.js';

import klaviyoRouter from './routes/klaviyo/index.js';
import metaRouter from './routes/meta/index.js';
import googleRouter from './routes/google/index.js';
import typeformRouter from './routes/typeform/index.js';

const app = express();
const pool = await createPool('KLAVIYO_DB');  // klaviyo has its own DB; others can share or have none
const redis = await createRedis();

app.use(requestLog);
app.use(cors(corsPolicy));
app.use(express.json({ limit: '10mb' }));

// Everything below this line requires a valid token.
app.use('/api', authenticate({ pool }));

app.use('/api/klaviyo', klaviyoRouter({ pool, redis }));
app.use('/api/meta', metaRouter({ redis }));
app.use('/api/google-analytics', googleRouter({ redis }));  // matches Caddy /api/dashboard-analytics rewrite
app.use('/api/typeform', typeformRouter({ redis }));

app.get('/health', (req, res) => res.json({ ok: true }));
app.use(errorHandler);

app.listen(process.env.DASHBOARD_PORT || 3015);

Per-vendor routers

Each vendor's existing route file becomes a factory that takes the shared pool/redis and returns an Express router. Replace each server's per-instance pool/redis with the injected one.

Permission gates (sensitive routes only)

Authenticated-only is the default after app.use('/api', authenticate(...)). For sensitive operations, add requirePermission per route:

Anything that mutates Klaviyo lists/segments → requirePermission('klaviyo_write')
Triggering a campaign sync → requirePermission('klaviyo_admin')
Read-only dashboards → no extra check beyond authenticate.

Define the new permission codes in the permissions table via a migration in Phase 6.

Dependency dedup

Decision: standardize on ioredis. Klaviyo's larger codebase already uses it, and ioredis has better cluster/sentinel support if we ever need it. Update meta/google/typeform call sites — each is a handful of get/set calls, mechanical conversion. Remove the redis package from dashboard-server's package.json.

Env consolidation

Single .env at inventory/dashboard/.env, prefixed keys:

DASHBOARD_PORT=3015
KLAVIYO_API_KEY=...
KLAVIYO_DB_HOST=...
KLAVIYO_DB_NAME=...
META_ACCESS_TOKEN=...
GOOGLE_SERVICE_ACCOUNT_KEY=...
TYPEFORM_TOKEN=...
REDIS_URL=...
JWT_SECRET=...  # shared with auth-server; same secret means same tokens valid here

Klaviyo's `scripts/import-campaign-products.js`

One-shot script — keep it, but run it from the merged dashboard-server's directory. Update the script's imports to ESM. If it's run via cron, update the cron entry to the new path.

Risk: shared error states

When all four vendors share a Redis client, a Redis hiccup affects all four. Make sure the connection has retry config (ioredis defaults are reasonable but verify) and that vendor routes degrade gracefully when Redis is unavailable (most use it as a cache, so cache-miss → fall through to upstream API is the right behavior).

Phase 5 — Convert `acot-server` to ESM (stays standalone)

Status: Not started. Largest single conversion (~5K LOC), but no merge involved.

Special concern: ssh2 tunnel

acot-server opens an SSH tunnel via ssh2 to access the production MySQL at 192.168.1.5:3309. The tunnel must be:

Established before the HTTP listener starts (so no requests fail with "no DB connection").
Re-established on disconnect (ssh2 connection's close event → recreate).
Cleanly torn down on SIGTERM/SIGINT so PM2 restarts don't leak file descriptors.

Verify (or add) this lifecycle handling as part of the conversion. If it's already correct, conversion is mechanical; if not, this is a good moment to fix it.

Test strategy

Same as inventory-server: start with PM2, smoke-test the most-used /api/acot/* endpoints, watch logs for unhandled rejection or tunnel-close events.

Phase 6 — Auth hardening

Status: Complete (code) — 2026-05-23. Application gated on Phase F1. All in-process hardening shipped alongside the Phase 3 ESM conversion. The authenticate() middleware is wired live on /api/* in inventory-server — the moment that code reaches production, the frontend stops working until Phase F1 lands, because today's frontend doesn't include Authorization: Bearer on the vast majority of fetch calls (see Phase F1 below for the diagnosis).

Per-item status:

#	Item	Status	Where
6.1	Caddy `forward_auth` gate	Proposed — apply after F1	`inventory-server/deploy/Caddyfile.proposed`
6.2	`requirePermission` on sensitive routes + permissions migration	Done	inline in `config.js`, `data-management.js`, `import.js`, `ai-prompts.js`, `ai-validation.js`, `templates.js`, `reusable-images.js`; codes seeded by `migrations/005_phase6_permission_codes.sql`
6.3	Login rate-limit + `/verify` rate-limit	Done	`auth/server.js` uses `shared/rate-limit/login.js` (`loginLimiter`, `verifyLimiter`)
6.4	JWT_SECRET as startup precondition + ecosystem footgun fix	Done in code; proposed for ecosystem.cjs	Both auth-server and inventory-server `process.exit(1)` if `JWT_SECRET` is unset. `inventory-server/deploy/ecosystem.config.cjs.proposed` removes the `JWT_SECRET: process.env.JWT_SECRET` override that was shadowing `.env`
6.5	Structured request logging w/ redaction	Done	`shared/logging/request-log.js` (pino-http, redacts Authorization/Cookie); mounted in both `auth/server.js` and `src/server.js`
6.6	CORS lockdown	Done	`src/middleware/cors.js` now re-exports `shared/cors/policy.js`. LAN wildcards (`192.168.`, `10.`) and `*` defaults gone
6.7	Upload hardening	Done	Exact-match MIME+extension allowlist on `routes/import.js` and `routes/reusable-images.js`; dead `multer({ dest })` removed from `routes/products.js` (no upload route was using it — strongest hardening was deletion)
6.8	Frontend token storage stays localStorage + XSS audit	Audited	Confirmed `dangerouslySetInnerHTML` is sanitized in `ProductEditor.tsx`. Flagged: `ChatRoom.tsx:277,392` renders user-controlled chat content as raw HTML — real XSS vector, separate fix needed
6.9	Remove debug middleware	Done	The header-dumping `app.use((req,res,next)=>{ console.log(... req.headers ...) })` block removed from `src/server.js`. Replaced with `shared/logging/request-log.js` (which redacts).
6.10	`lt-wordlist-api` token move	Proposed for ecosystem.cjs	`inventory-server/deploy/ecosystem.config.cjs.proposed` shows the entry without inline token; apply alongside rotating the secret value into `/opt/lt-wordlist-api/.env`
6.11	Audit logging for sensitive ops	Deferred	Out of scope for this pass per user direction. Existing `import_audit_log` and `product_editor_audit_log` tables stay as-is; generic `system_audit_log` table + middleware is its own project

6.1 Caddy `forward_auth` gate

Add to the tools.acherryontop.com block, before the @api_routes handler:

# Forward-auth gate for all API traffic
@needs_auth path /api/* /chat-api/*
handle @needs_auth {
    forward_auth localhost:3011 {
        uri /verify
        copy_headers Authorization
        # On 401/403, Caddy returns the auth-server's response body verbatim
    }
    # Existing per-vendor handle blocks remain below this line
}

# /auth-inv/* stays public (you need to log in!)
handle /auth-inv/* {
    uri strip_prefix /auth-inv
    reverse_proxy localhost:3011
}

The forward_auth directive subrequests /verify on the auth-server. If it returns 2xx, the request proceeds upstream. If 401/403, Caddy returns that response to the client and never hits the backend.

This is the first line of defense. Per-server middleware (shared/auth/middleware.js) is the second line — re-verifies the JWT independently. Defense in depth: a Caddyfile typo can't open a hole.

6.2 Per-route permission gates

After per-server authenticate(), add requirePermission(code) to destructive or sensitive routes. Audit needed in:

inventory-server/src/routes/config.js — global config writes → admin
inventory-server/src/routes/import.js — uploads, deletes, generate-upc → product_import
inventory-server/src/routes/data-management.js — CSV operations → data_management
inventory-server/src/routes/ai-prompts.js — prompt edits → ai_admin
inventory-server/src/routes/templates.js — template writes → templates_write
inventory-server/src/routes/reusable-images.js — image management → image_admin
inventory-server/src/routes/products.js — only one POST (/resolve-identifiers); evaluate whether it needs a permission code or authenticated-only is fine
inventory-server/src/routes/product-editor-audit-log.js and import-audit-log.js — read-only by sensitive users → audit_read
dashboard-server Klaviyo/Meta/Google/Typeform write endpoints → vendor-specific codes per above

Migration: a single SQL script that inserts the new permission codes into the permissions table and assigns them to existing admin users. Non-admin users get permissions explicitly granted via the user management UI.

INSERT INTO permissions (code, name) VALUES
  ('product_import', 'Product Import'),
  ('data_management', 'Data Management'),
  ('ai_admin', 'AI Settings Admin'),
  ('templates_write', 'Template Editing'),
  ('image_admin', 'Image Management'),
  ('audit_read', 'Audit Log Access'),
  ('klaviyo_write', 'Klaviyo Write'),
  ('klaviyo_admin', 'Klaviyo Admin'),
  ('meta_write', 'Meta Write'),
  ('google_write', 'Google Analytics Write'),
  ('typeform_write', 'Typeform Write'),
  ('acot_admin', 'ACOT Server Admin')
ON CONFLICT (code) DO NOTHING;

shared/rate-limit/login.js:

import rateLimit from 'express-rate-limit';
export const loginLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,      // 15 minutes
  max: 10,                        // 10 attempts per IP per window
  message: { error: 'Too many login attempts, try again later' },
  standardHeaders: true,
  legacyHeaders: false,
});

Apply in auth-server on the /login route. Consider also rate-limiting /verify and /me (much higher cap, ~600/min — they're called legitimately by every page load).

6.4 JWT secret rotation

Rotate JWT_SECRET to a fresh 32-byte random string as part of the deployment.
Document that rotation logs out all users — acceptable for an internal tool, do it during off-hours.
Add JWT_SECRET to the env var validation block in auth-server/server.js (refuse to start if not set).
Fix the existing footgun: /var/www/ecosystem.config.cjs currently has JWT_SECRET: process.env.JWT_SECRET after ...inventoryEnv in the new-auth-server block. This shadows the .env value with whatever the shell exported when PM2 was started — which has already silently diverged at least once (detected and fixed 2026-05-23 by a clean PM2 restart in a shell without JWT_SECRET exported). Delete that override line during rotation; let .env be the single source of truth.

6.5 Request logging

shared/logging/request-log.js — log method, path, status, duration, user-id (if authenticated). Never log Authorization or Cookie headers. Remove the current server.js:79-87 debug middleware in inventory-server (it logs full headers including the bearer token).

6.6 CORS lockdown

Current middleware/cors.js allows 192.168.*.* and 10.*.*.* with credentials: true. Tighten to explicit known origins:

origin: [
  'https://tools.acherryontop.com',
  'https://inventory.kent.pw',
  /^http:\/\/localhost:(5174|5175)$/,
]

If anyone genuinely needs LAN access, add their specific IP, not a /16 range.

6.7 Upload hardening

POST /api/import/upload-image (multer-backed) needs:

File-size limit set on multer config (current limit may be defaulted — verify).
MIME-type allowlist (image/jpeg, image/png, image/webp; reject everything else).
Filename sanitization (no .., no absolute paths, generate UUID-based names server-side).
The Caddy /uploads/* handler currently serves any file in the uploads directory publicly. Move this behind the auth gate: include /uploads/* in @needs_auth. If some images are referenced from public emails (Klaviyo newsletter), put those in a separate public bucket; everything else stays gated.

6.8 Frontend token storage

Decision: stay on localStorage. This is an internal tool with no untrusted user-generated HTML being rendered, so the XSS-token-theft surface is small. The forward_auth gate is the main security gap we're addressing; cookie-based auth would be a larger, separate project (cookie-parser, CSRF double-submit pattern, AuthContext refactor) that doesn't change the threat model for an internal tool with no public sign-up.

Sanity check during this refactor: grep the React codebase for dangerouslySetInnerHTML. If any usages exist, verify each one is rendering trusted (server-controlled, not user-supplied) content. If a user-supplied content path exists, that's a real XSS vector and needs separate remediation regardless of token-storage choice.

6.9 Remove debug middleware

inventory-server/src/server.js:79-87 logs full request headers including Authorization. Delete this block. Replace with shared/logging/request-log.js.

6.10 `lt-wordlist-api` token

ADD_WORD_TOKEN is currently hardcoded in /var/www/ecosystem.config.cjs. Move to /opt/lt-wordlist-api/.env, rotate the token value, update any callers.

6.11 Audit logging for sensitive operations

Already have import-audit-log and product-editor-audit-log tables. Extend the pattern:

Log user_id, endpoint, params, result for config.js writes and data-management.js operations.
Schema: reuse the existing audit table pattern or add a generic system_audit_log table.
Don't log request bodies wholesale (may contain large blobs); log the action and the target ID.

Phase F1 — Frontend fetch wrapper (NEW — 2026-05-23)

Status: Not started. CRITICAL. Blocks the Phase 3+6 deploy from being usable.

The discovery

While wiring authenticate() on /api/* in Phase 6.1/6.2, we audited the frontend's fetch usage and found:

7 call sites send Authorization: Bearer ${token} explicitly (all in AuthContext.tsx for /me + /login, plus a couple of settings/* pages).
~220 other fetch(...) / axios.*(...) call sites across inventory/src/services/, inventory/src/pages/, inventory/src/components/ send no Authorization header at all.
There is no global fetch wrapper, axios interceptor, or service-worker shim that injects the token.

Today this works because nothing on the server checks. Caddy currently has no forward_auth gate (Phase 6.1 is a Caddyfile change that hasn't shipped yet) and the previous inventory-server had no authenticate() middleware. The frontend's auth model was "you log in once to get the token; the token is checked only by /me; everything else is implicitly trusted at the network layer."

With Phase 6 code in production, every page refresh 401s on the first API call after the next pm2 reload. The user explicitly accepted this when authorising the Phase 6 work — but the fix is its own deliverable, and shipping Phase 3+6 to PM2 without F1 in the same window means an outage window measured in however long F1 takes (not minutes).

Recommended approach

Add a single fetch wrapper at inventory/src/utils/api.ts (or similar) and migrate the ~220 call sites to use it. The wrapper:

Reads localStorage.getItem('token') on every call (cheap; localStorage is sync).
Merges Authorization: Bearer ${token} into the request headers if a token exists.
Intercepts 401 responses → fires window.dispatchEvent(new Event('auth:logout')) (a listener already exists in AuthContext.tsx:117) so the user gets bounced to /login cleanly instead of seeing broken pages.
Preserves the existing call shape — apiFetch(url, init) should be a drop-in for fetch(url, init) so the migration is mechanical.

// inventory/src/utils/api.ts (sketch)
export async function apiFetch(input: RequestInfo | URL, init: RequestInit = {}): Promise<Response> {
  const token = localStorage.getItem('token');
  const headers = new Headers(init.headers);
  if (token && !headers.has('Authorization')) {
    headers.set('Authorization', `Bearer ${token}`);
  }
  const res = await fetch(input, { ...init, headers });
  if (res.status === 401 && token) {
    // Token expired or revoked — bounce to /login. AuthContext already listens.
    window.dispatchEvent(new Event('auth:logout'));
  }
  return res;
}

Same shape for axios:

// inventory/src/utils/apiClient.ts (sketch)
import axios from 'axios';
export const apiClient = axios.create();
apiClient.interceptors.request.use((config) => {
  const token = localStorage.getItem('token');
  if (token) config.headers.Authorization = `Bearer ${token}`;
  return config;
});
apiClient.interceptors.response.use(
  (r) => r,
  (err) => {
    if (err?.response?.status === 401) window.dispatchEvent(new Event('auth:logout'));
    return Promise.reject(err);
  },
);

Migration plan

Land the two wrapper modules above. ~50 LOC total.
Codemod or sed-loop: in inventory/src/, replace fetch( → apiFetch( (with the right import) and axios.get/post/... → apiClient.get/post/.... ~220 call sites — a half-day of careful find-and-replace plus per-page verification. Spot-check the ones with custom Content-Type (multipart uploads especially) so the wrapper doesn't clobber multipart boundaries.
Leave the AuthContext.tsx /login and /me calls alone — they already work and migrating them adds no value.
Run the SPA: log in, exercise Overview / Products / Analytics / Dashboard / etc. with browser devtools open watching for Authorization header on every /api/* request.

Sequencing with Phase 3+6 deploy

Two options:

A) Ship F1 first (recommended). Frontend goes out with the wrapper; nothing changes server-side. Then pm2 reload Phase 3+6. Zero-downtime, zero broken-page window.

B) Ship together. F1 and Phase 3+6 land in the same deploy. Brief window (seconds) where the frontend has the wrapper but the server hasn't reloaded yet — wrapper just sends extra headers the old server ignores. Safe.

Do not ship Phase 3+6 first and F1 second. That gives a broken app for as long as F1 takes.

Out of scope (kept on `localStorage`)

Per Phase 6.8, we're not migrating to httpOnly cookie auth. F1 is the minimum work to make the per-service authenticate() (Phase 6) actually usable. A future Phase F2 could move to cookies + CSRF double-submit, but that's a much larger change touching the AuthContext, the login flow, and every backend that reads tokens. Not justified for an internal tool with no public sign-up.

Note on `/uploads/*` gating (Phase 6.7's Caddyfile change)

The proposed Caddyfile moves /uploads/* behind forward_auth. Most product images today are referenced from <img src="/uploads/..."> in the SPA — those requests are made by the browser, which does not include Authorization headers on image requests. Fixing this is part of F1's scope too: either (a) keep /uploads/* public (revert that part of 6.7) and accept that uploaded images leak to anyone who guesses a URL, or (b) issue per-image signed URLs from the API and gate those at Caddy. Decide before applying the Caddyfile.

Phase 7 — Caddyfile final form

Status: Proposed (2026-05-23). Apply blocked on Phase F1. The full proposed file lives at inventory-server/deploy/Caddyfile.proposed and matches the spec below except that vendor handle blocks still point to per-vendor PM2 apps (Phase 4 hasn't merged them yet). See inventory-server/deploy/README.md for the apply commands (admin-API + sudo cp pattern from Phase 2 deviation #8).

After all phases, the tools.acherryontop.com block looks like:

tools.acherryontop.com {
    import security_headers

    # Public: login endpoint
    handle /auth-inv/* {
        uri strip_prefix /auth-inv
        reverse_proxy localhost:3011
    }

    # Public: static frontend assets
    @static path *.js *.css *.png *.jpg *.jpeg *.gif *.ico *.svg *.woff *.woff2
    handle @static {
        header Cache-Control "public, max-age=2592000"
        root * /var/www/inventory/frontend/build
        file_server
    }

    # All API + uploads: auth gate first
    @gated path /api/* /chat-api/* /uploads/*
    handle @gated {
        forward_auth localhost:3011 {
            uri /verify
            copy_headers Authorization
        }

        # Uploaded files
        handle /uploads/* {
            root * /var/www/inventory
            file_server
        }

        # Vendor dashboard routes → merged dashboard-server
        handle /api/klaviyo/*      { reverse_proxy localhost:3015 }
        handle /api/meta/*         { reverse_proxy localhost:3015 }
        handle /api/google-analytics/* { reverse_proxy localhost:3015 }
        handle /api/typeform/*     { reverse_proxy localhost:3015 }

        # ACOT-specific
        handle /api/acot/*         { reverse_proxy localhost:3012 }

        # Chat
        handle /chat-api/* {
            uri strip_prefix /chat-api
            reverse_proxy localhost:3014
        }

        # Catch-all: inventory-server
        handle /api/* { reverse_proxy localhost:3010 }
    }

    handle /health { reverse_proxy localhost:3010 }

    # SPA fallback
    handle {
        root * /var/www/inventory/frontend/build
        try_files {path} /index.html
        file_server
        encode gzip
    }

    handle_errors {
        respond "{err.status_code} {err.status_text}"
    }
}

Removed: /dashboard-auth/*, /api/aircall/*, /api/gorgias/*, /api/clarity/*, the LAN/Access-Control-Allow-Origin "*" permissive defaults on /api/*. Kept: /apiv2/* and /apiv2-test/* proxies to backend.acherryontop.com (out of scope, separate system).

Phase 8 — ecosystem.config.cjs final form

Status: Proposed (2026-05-23). Full proposed file at inventory-server/deploy/ecosystem.config.cjs.proposed. Includes the Phase 6.4 JWT_SECRET shadow-override fix and the Phase 6.10 lt-wordlist-api token move. Still lists per-vendor PM2 apps until Phase 4 merge ships — that's the only thing keeping app count at 10 instead of the target 5.

module.exports = {
  apps: [
    {
      name: 'auth-server',
      script: './inventory/auth/server.js',
      cwd: '/var/www',
      env: { NODE_ENV: 'production', AUTH_PORT: 3011 },
      ...commonSettings,
    },
    {
      name: 'inventory-server',
      script: './inventory/src/server.js',
      cwd: '/var/www',
      env: { NODE_ENV: 'production', PORT: 3010, UPLOADS_DIR: '/var/www/inventory/uploads' },
      ...commonSettings,
    },
    {
      name: 'dashboard-server',
      script: './inventory/dashboard/server.js',
      cwd: '/var/www',
      env: { NODE_ENV: 'production', DASHBOARD_PORT: 3015 },
      ...commonSettings,
    },
    {
      name: 'acot-server',
      script: './inventory/dashboard/acot-server/server.js',
      cwd: '/var/www',
      env: { NODE_ENV: 'production', ACOT_PORT: 3012 },
      ...commonSettings,
    },
    {
      name: 'chat-server',
      script: './inventory/chat/server.js',
      cwd: '/var/www',
      env: { NODE_ENV: 'production', PORT: 3014 },
      ...commonSettings,
    },
    // acot-phone-server and lt-wordlist-api unchanged
  ],
};

Five entries instead of twelve. Each app loads its own .env from its directory (already handled by dotenv.config).

Sequencing & dependencies

Phase 1 (decommission) ──┬─────────────────────────────────────────┐
                         │                                         │
                         ▼                                         │
                   Phase 2 (shared lib/)                           │
                         │                                         │
          ┌──────────────┼──────────────┐                          │
          ▼              ▼              ▼                          ▼
   Phase 3a         Phase 3b        Phase 4              Phase 6 (auth hardening
   inventory-server auth-server     dashboard-server     runs alongside 3+4+5,
   to ESM           to ESM + /verify build & test        completes after them)
          │              │              │                          │
          └──────────────┼──────────────┘                          │
                         ▼                                         │
                   Phase 5 (acot-server to ESM) ──────────────────►│
                                                                   ▼
                                                          Phase 7 (Caddy cutover)
                                                                   │
                                                                   ▼
                                                          Phase 8 (PM2 final state)

Phase 1 unblocks everything (fewer services to convert). Phase 2 is the foundation; nothing else can start until shared lib/ exists. Phases 3–5 can run in parallel; they touch independent services. Phase 6's sub-items can be developed alongside 3–5 but enabled only after them (no point adding requirePermission to a route that doesn't yet have authenticate). Phase F1 must precede the Phase 3+6 pm2 reload — without the fetch wrapper, the moment the new code goes live the SPA breaks. Discovered during Phase 3+6 implementation; see Phase F1. Phase 7 is the cutover: Caddyfile flip happens after F1 ships AND after the /uploads/* gating decision in F1 is made. Phase 8 is cleanup: remove dead PM2 entries.

Estimated effort, end-to-end: ~3 weeks of focused work by one engineer. Phase 1 ≈ 1 day, Phase 2 ≈ 2 days, Phase 3 ≈ 3 days (both services), Phase 4 ≈ 5–7 days (the merge), Phase 5 ≈ 2–3 days, Phase 6 ≈ 3–4 days, Phase F1 ≈ 0.5–1 day, Phase 7+8 ≈ 1 day.

Testing strategy

No formal test suite exists today (per CLAUDE.md). For a refactor this size, that's a gap to close — but writing tests retroactively for 15K LOC of routes is a separate, larger project. For this refactor:

Manual smoke testing per phase

A checklist of representative endpoints to hit after each deploy:

inventory-server: /api/products, /api/dashboard/overview, /api/analytics/revenue, /api/orders, /api/purchase-orders, /api/import/list-uploads, /api/config/global
dashboard-server: /api/klaviyo/campaigns, /api/meta/insights, /api/google-analytics/..., /api/typeform/responses
acot-server: /api/acot/... (top-3 endpoints by call volume — pull from access logs)
auth-server: /login, /me, /verify

Each smoke test runs (a) without a token → expect 401, (b) with an invalid token → expect 401, (c) with a valid token → expect 2xx.

Frontend integration check

After deploys, log into the SPA and exercise each major page (Overview, Products, Analytics, Dashboard, Klaviyo, Meta, etc.). If everything loads and dashboards populate, the auth + routing layer is intact.

Test scaffold during Phase 2 (committed)

While building shared/, set up vitest (lightweight, ESM-native, fast) as the standard test runner for the repo. Initial coverage focuses on the security-critical surface only:

shared/auth/verify.js — known good token, expired token, wrong-signature token, malformed token, missing token.
shared/auth/middleware.js — request with no header → 401; bad header → 401; valid token + inactive user → 403; valid token + missing permission → 403; valid token + correct permission → next() called with req.user populated.
shared/auth/middleware.js user-cache TTL: same token within 60s → one DB hit; same token after 61s → two DB hits.

package.json gets a "test": "vitest run" script at the repo root and per-service. Set up but don't backfill broader test coverage — that's a separate, larger project. The vitest scaffold gives future work a foothold; this refactor commits to having tests for the auth boundary specifically because that's what's load-bearing for the whole security model.

Rollback strategy

Each phase produces an independently deployable state. Rollback per phase:

Phase 1: re-add removed services to ecosystem; restore from git. Don't roll back data deletions — only do those after a week of stable production.
Phases 3, 5: ESM conversion is per-service; if one service breaks, pm2 restart <name> to the previous commit. Other services unaffected.
Phase 4: the dashboard-server merge is the highest-risk change. Plan: deploy dashboard-server to a non-conflicting port (3015) while leaving the old per-vendor servers running. Cut over Caddy routes one vendor at a time (start with Meta — smallest). If any vendor breaks, point Caddy back to the old server (still running) for that vendor, debug, retry. Only delete the old servers after all four are stable on dashboard-server.
Phases 6, 7: Caddy config is git-tracked. git revert + caddy reload rolls back in seconds. Auth changes are additive (defense in depth) — if forward_auth causes problems, comment it out and per-server middleware continues protecting routes.

Out of scope (intentional)

These came up in the audit but aren't part of this refactor:

httpOnly cookie auth ("Phase F2" — deferred). Phase F1 keeps localStorage + Bearer header because that's the minimum to unblock the Phase 6 authenticate() rollout. A future move to cookie auth would touch AuthContext, every backend that reads tokens, and introduce CSRF concerns — much larger project.
Replacing PM2 with systemd or Docker.
Test coverage beyond the auth-critical surface.
apiv2/apiv2-test proxies to backend.acherryontop.com — separate system, not touched.
acot-phone-server and lt-wordlist-api — staying as-is.
Centralized observability stack (Prometheus, Grafana). The logger work in Phase 6.5 sets up the data, but shipping it somewhere is future work.
ChatRoom XSS remediation (flagged during Phase 6.8 audit — inventory/src/components/chat/ChatRoom.tsx:277,392 renders user-controlled chat content via dangerouslySetInnerHTML without sanitization). Real vulnerability for an internal-but-multi-user tool; separate fix.

Concrete deliverables

When this is done:

4 application PM2 processes instead of 12 (plus 2 unchanged: acot-phone, lt-wordlist).
All /api/* and /chat-api/* requests gated at Caddy and re-verified at each upstream.
Sensitive endpoints additionally gated by per-permission checks.
One ESM standard across the entire Node codebase.
One shared lib/ for auth, logging, DB, errors, CORS.
Login rate-limited.
JWT_SECRET rotated.
Old auth-server, Aircall, Gorgias, Clarity directories deleted from the repo.
Caddyfile slimmed to one auth-gated block.
Permission codes inserted into permissions table for granular authorization.
No half-finished pieces, no // TODO: add auth later comments, no deferred secrets cleanup.

Deviations from original plan (recorded during execution)

These are decisions made during Phase 1/2 implementation that amend the spec above. Future phases should follow the deviated path, not the original sketch.

shared/ location. Original plan placed shared/ at the repo root as a sibling of inventory/ and acot-phone/. Implemented at inventory-server/shared/ (= /var/www/inventory/shared/ on the server) instead. Reason: the actual project root is /var/www/inventory/; placing shared/ outside it would have meant building a deployment story for it that doesn't exist. Import paths change accordingly:
- From inventory-server/{auth,src,chat}/server.js → ../shared/...
- From inventory-server/dashboard/{vendor}-server/server.js → ../../shared/...
/verify response headers. Plan specified X-User-Id + X-User-Is-Admin. Implemented as X-User-Id + X-User-Username (both available from the JWT payload). X-User-Is-Admin was dropped because is_admin isn't in the JWT today and returning it would require a DB lookup — violating the "no DB hit" principle. To restore X-User-Is-Admin, enrich the JWT payload at login time (one-line change in auth/routes.js) during Phase 6, then echo from /verify. Upstreams don't trust these headers anyway (they re-verify), so the omission is informational, not security-relevant.
User cache key in shared/auth/middleware.js. Plan sketch mentioned "60s TTL keyed by token jti". Implemented as keyed by userId instead — the JWT doesn't currently include a jti claim, and the cache's invalidation semantics are "this user was deactivated/changed permissions" (per-user), not "this token was revoked" (per-token). The plan's pseudocode already used loadUserCached(pool, decoded.userId) so this matches the spirit.
Redis client safety. shared/db/redis.js sets enableOfflineQueue: false and lazyConnect: true. Plan didn't specify but these defaults mean a Redis hiccup fails fast (route fall-through to upstream API as designed in Phase 4 risk notes) rather than queueing commands indefinitely.
CORS allowed origins kept https://acot.site. Plan example listed three origins; production has acot.site as a redirect to tools.acherryontop.com but also reaches the API directly in some flows. Kept it to avoid breakage. LAN wildcards (192.168.*, 10.*) and Access-Control-Allow-Origin "*" are NOT included in the new shared/cors/policy.js per the plan's Phase 6.6 spirit, but the legacy inventory-server/src/middleware/cors.js still has them until services are migrated to consume shared/cors/.
Defunct permission codes left in DB. Removed the dashboard:gorgias and dashboard:calls Protected blocks from the frontend, but the corresponding permission rows in the permissions table are still there (assigned to some users). They're inert (no UI references them) but should be cleaned up alongside the Phase 6.2 permissions migration.
PM2 process names retained new-auth-server (not auth-server). Plan's Phase 8 final form names it auth-server (after the legacy 3003 one is removed). Decided to keep the existing new-auth-server name through Phase 2 to avoid a rename mid-stream. Phase 8 can rename if desired, but it's cosmetic — all wiring is by port (3011) not name.
Caddyfile changes via admin API on :2020. The Caddyfile is owned by root and matt has no passwordless sudo. Cutover used curl -X POST .../load on the Caddy admin port (which matt can hit), then a separate sudo cp /home/matt/Caddyfile.new /etc/caddy/Caddyfile step to persist the on-disk file. Future Caddyfile changes can follow the same pattern. Backup convention: /etc/caddy/Caddyfile.bak.YYYY-MM-DD.
Path-naming. Plan uses inventory/ as the top-level (server-side path convention). Locally the equivalent is inventory-server/. Whenever the plan says inventory/dashboard/foo/, read that as /var/www/inventory/dashboard/foo/ on the server or inventory-server/dashboard/foo/ locally.
Scripts directory kept CJS via package.json shim. Original plan called for converting "any spawned script" to ESM alongside its caller. Implemented: added inventory-server/scripts/package.json with "type": "commonjs". Node's package-type resolution walks up directory by directory, so this overrides the parent's "type": "module" for the entire scripts/ tree (≈15 files including import/*.js, metrics-new/utils/*, the orchestrator scripts) without renaming any file or touching any spawn() callsite. Convert individual scripts to ESM when touched; don't bulk-migrate.
src/routes/products.js had dead multer setup. Phase 6.7 spec called for hardening the upload route in products.js. There was no upload route — the multer({ dest }) instance and importProductsFromCSV import were dead code left over from a long-ago migration. Strongest 6.7 hardening was deletion: no upload handler = no attack surface. The two real upload paths (/api/import/upload-image and /api/reusable-images/upload) got tightened MIME+extension allowlists instead.
Two pre-existing syntax errors in shared/db/ surfaced. shared/db/pg.js:13 and shared/db/redis.js:22 both had ?? Number(...) || N — mixing ?? and || without parentheses is a TC39 syntax error. They passed Phase 2 because nothing imported them yet; Phase 3 smoke-test exposed it. Fixed with parens.
import { Pool } from 'pg' doesn't work in ESM. The pg package is CJS using module.exports = { Pool, ... }. Node's ESM-from-CJS interop fails to detect Pool as a named export via static analysis. The bulletproof pattern, now used everywhere: import pg from 'pg'; const { Pool } = pg;. Same idea for any future CJS-only deps. src/utils/db.js already had it; the two auth files needed the fix during execution.
Frontend Bearer-header gap discovered (drives new Phase F1). Phase 6 was specified assuming the frontend already sends Authorization: Bearer on every API call. It does not — only 7 of ~220 call sites do. Phase 6's authenticate() middleware is shipped and ready to enable, but until F1 lands the SPA will 401 on every page. The plan now has Phase F1 to address this explicitly; until then, the Phase 3+6 pm2 reload should not ship unless F1 ships in the same window.
macOS NFS workflow note. The inventory-server/ directory locally is an NFS mount of /var/www/inventory/ on netcup. Bulk operations (find/grep -r/mass node --check/npm install) hang or take minutes locally and pollute file listings with macOS AppleDouble ._* sidecar files. Default to ssh netcup for any sweep across the tree — individual file edits via the editor are fine.

56 KiB Raw Blame History Unescape Escape

Server Consolidation & Security Hardening Plan

Status (2026-05-23)

Goals

Non-goals

Target architecture

Phase 1 — Decommission dead/leaving services

To remove

Verification before deletion

Database/secret cleanup

Phase 2 — Build the shared lib/

Location

Modules to create

Auth middleware spec (shared/auth/middleware.js)

Why a 60s in-memory user cache

Add to auth-server: a /verify endpoint

Phase 3 — Convert auth-server and inventory-server to ESM

Mechanical conversion

Risk areas in inventory-server

Test strategy

Auth-server

Phase 4 — Build dashboard-server (the merge)

Layout

Mount points

Per-vendor routers

Permission gates (sensitive routes only)

Dependency dedup

Env consolidation

Klaviyo's scripts/import-campaign-products.js

Risk: shared error states

Phase 5 — Convert acot-server to ESM (stays standalone)

Special concern: ssh2 tunnel

Test strategy

Phase 6 — Auth hardening

6.1 Caddy forward_auth gate

6.2 Per-route permission gates

6.3 Rate limiting on login

6.4 JWT secret rotation

6.5 Request logging

6.6 CORS lockdown

6.7 Upload hardening

6.8 Frontend token storage

6.9 Remove debug middleware

6.10 lt-wordlist-api token

6.11 Audit logging for sensitive operations

Phase F1 — Frontend fetch wrapper (NEW — 2026-05-23)

The discovery

Recommended approach

Migration plan

Sequencing with Phase 3+6 deploy

Out of scope (kept on localStorage)

Note on /uploads/* gating (Phase 6.7's Caddyfile change)

Phase 7 — Caddyfile final form

Phase 8 — ecosystem.config.cjs final form

Sequencing & dependencies

Testing strategy

Manual smoke testing per phase

Frontend integration check

Test scaffold during Phase 2 (committed)

Rollback strategy

Out of scope (intentional)

Concrete deliverables

Deviations from original plan (recorded during execution)

56 KiB

Raw Blame History

Phase 2 — Build the shared `lib/`

Auth middleware spec (`shared/auth/middleware.js`)

Add to `auth-server`: a `/verify` endpoint

Phase 3 — Convert `auth-server` and `inventory-server` to ESM

Phase 4 — Build `dashboard-server` (the merge)

Klaviyo's `scripts/import-campaign-products.js`

Phase 5 — Convert `acot-server` to ESM (stays standalone)

6.1 Caddy `forward_auth` gate

6.10 `lt-wordlist-api` token

Out of scope (kept on `localStorage`)

Note on `/uploads/*` gating (Phase 6.7's Caddyfile change)