2 Commits

Author SHA1 Message Date
matt 069a44bd54 Import/calculations improvements 2026-06-11 19:32:20 -04:00
matt 3b2f51e6b8 Forecast improvements 2026-06-11 14:55:33 -04:00
24 changed files with 2062 additions and 446 deletions
+343
View File
@@ -0,0 +1,343 @@
# Forecast Accuracy Fix Plan
**Written:** 2026-06-10, from a code + live-data review of the forecasting pipeline.
**Goal:** eliminate the systematic ~1.72x over-forecast bias, recover demand the model currently ignores, and fix the accuracy measurement so improvements are visible and long-lead forecasts are validated.
Read this whole document before starting. Fixes are grouped into phases; each phase is independently deployable and has its own validation step. Line numbers are as of 2026-06-10 — re-locate by function name if the file has drifted.
---
## 1. Diagnosis summary (measured 2026-06-10)
The dashboard headline is **202% WMAPE**. Decomposition of that number, all measured against `forecast_accuracy` run 129 and ad-hoc queries:
| Finding | Evidence |
|---|---|
| Daily-grain WMAPE has a ~190% *floor* for this catalog | Avg demand ≈ 0.11 units/product/day. A perfect rate forecast of intermittent demand scores ≈ 2e^−λ ≈ 190%. A trivial trailing-30d-average naive forecast scores **204%** on the same products/days; the engine scores 221% (slightly *worse than naive*). |
| Same forecasts at 21-day-per-product grain: **109%**; bias-corrected: **75%** | Half the headline is metric grain, most of the rest is bias. |
| Aggregate over-forecast **+70%** (227,690 forecast vs 133,861 actual units) | Portfolio daily ratio is 1.52.5x on most days. |
| Decay phase 2.47x over (fc 51,675 / act 20,915) | Root cause F1: velocity inflated **4.07x** (measured: 1.353 vs true 0.332 units/day) by averaging over sparse snapshot rows. |
| Preorder phase 2.15x over (fc 67,212 / act 31,189) | Root cause F4: launch curve applied at age=0 starting *today*, ignoring that the product hasn't arrived. |
| Mature phase 1.69x over (fc 57,857 / act 34,313) | Root causes F2 (history edge truncation) + F3 (seasonal double-count). |
| Dormant products sold **16,180 units** (~11% of demand) against zero forecasts | Root cause F5; also excluded from the headline metric, so invisible. |
| All 879,800 accuracy samples are in the **17d lead bucket** | Root cause F7: archiving design only ever saves yesterday's slice. 3090d forecasts (what purchasing uses) are never validated. |
| Launch phase is healthy: WMAPE 100%, bias 6%, beats naive | The lifecycle-curve concept works; its calibration inputs are broken. Don't redesign it. |
**Key data fact** underlying several fixes: `daily_product_snapshots` is **activity-based and sparse** — only ~5001,800 of ~38K products have a row on a given day. Verified: every pid-day with an order DOES have a snapshot row and units match (5,234/5,234 pid-days, 8,980 vs 8,984 units over 7 days). So *missing row = zero sales*, and any query that aggregates over only the rows that exist is averaging over sold-days.
---
## 2. Environment & operational notes
- **Files:** engine is `inventory-server/scripts/forecast/forecast_engine.py`; orchestrator `run_forecast.js` in the same dir; consumer endpoints in `inventory-server/src/routes/dashboard.js` (`/forecast/metrics` ~line 308, `/forecast/accuracy` ~line 647); overview UI in `inventory/src/components/overview/ForecastMetrics.tsx` and `ForecastAccuracy.tsx`.
- **Local `inventory-server/` is NFS-mounted to `/var/www/inventory/` on the netcup server.** Edits made locally appear on the server immediately — no copy step. Do NOT run bulk `grep`/`find`/`node --check` over `inventory-server/` locally (the mount hangs); `ssh netcup` and run them there.
- **Avoid the glob tool** for search in this repo; use bash (`grep`/`rg` via ssh for server-side trees).
- **Scheduling:** the engine runs daily at **09:30:01 server time** (runs table is conclusive), but the cron entry is NOT in matt's crontab, `/etc/cron.d`, or pm2. Likely root's crontab (`sudo crontab -l` to confirm). You do not need to touch the schedule for these fixes; just know a run fires at 09:30 daily and occasionally skips days (e.g. 2026-06-07/08).
- **Manual test runs:** `ssh netcup`, then `cd /var/www/inventory/scripts/forecast && node run_forecast.js`. Takes ~3.54 min. Safe to run any time: the engine TRUNCATEs and rebuilds `product_forecasts`, archives prior past-dated rows, and records a new `forecast_runs` row. Python deps live in the server venv (`venv/`); `run_forecast.js` handles env + venv automatically.
- **DB access for validation:** `ssh netcup`, then `PGPASSWORD=6D3GUkxuFgi2UghwgnUd psql -h localhost -U inventory_readonly -d inventory_db`. The engine itself connects with the write user via env vars loaded from `/var/www/inventory/.env` — schema changes should be made idempotently *inside the engine code* (the file already uses `CREATE TABLE IF NOT EXISTS` / `CREATE INDEX IF NOT EXISTS`; use `ALTER TABLE ... ADD COLUMN IF NOT EXISTS` the same way) so no manual migration is needed.
- **Python gotchas already handled in this file (don't regress):** numpy types must go through the registered psycopg2 adapters; `pd.Series.combine_first()` keeps zeros over real data — use `reindex(..., fill_value=0.0)`.
- Engine runtime budget: currently ~212227s. Phases 12 shouldn't move it meaningfully; Phase 3's extra archiving adds one INSERT…SELECT. If runtime balloons past ~6 min, investigate before shipping.
- `--backfill` mode (`backfill_accuracy_data`) is an in-sample backtest using the *old* formulas. **Do not run it anymore**; there is enough real out-of-sample history. Updating it to match the new logic is optional/low priority (F11).
---
## Phase 1 — Bias bugs in the engine (no schema changes)
### F1. Decay velocity: stop averaging over sparse snapshot rows
**Where:** `forecast_engine.py`, `batch_load_product_data()`, the decay query (~lines 697710).
**Problem:** `AVG(COALESCE(dps.units_sold, 0))` runs over only the snapshot rows that exist — mostly sold-days. Measured inflation on the current 975 decay products: **4.07x** (1.353 vs 0.332 true units/day). This feeds `compute_scale_factor()` for the decay phase and is the single largest bias source.
**Fix:** divide the sum by calendar days in the window, clipped to the product's age (decay products are 1460 days old, so a 20-day-old product's window is 20 days, not 30):
```sql
SELECT dps.pid,
SUM(COALESCE(dps.units_sold, 0))::float
/ GREATEST(LEAST(30, (CURRENT_DATE - pm.date_first_received::date)), 1) AS avg_daily
FROM daily_product_snapshots dps
JOIN product_metrics pm ON pm.pid = dps.pid
WHERE dps.pid = ANY(%s)
AND dps.snapshot_date >= CURRENT_DATE - INTERVAL '30 days'
AND dps.snapshot_date >= pm.date_first_received::date
GROUP BY dps.pid, pm.date_first_received
```
No Python-side changes needed; `data['decay_velocity']` keeps the same shape. Products with zero snapshot rows in the window still get no entry → existing `scale = 1.0` fallback applies (acceptable: decay classification requires `sales_velocity_daily > 0`, so truly dead products don't reach this path).
### F2. Mature history: reindex over the full calendar window
**Where:** `forecast_engine.py`, `forecast_mature()` (~lines 833836).
**Problem:** `hist.set_index('snapshot_date').resample('D').sum()` only spans first-snapshot → last-snapshot. Interior gaps correctly become zeros, but **leading and trailing quiet periods are absent**, so the Holt level is fitted on the product's busy span. A marginal mature product whose activity clusters in 2 of the last 8 weeks gets a level ~4x too high.
**Fix:** replace the resample with an explicit reindex over the full `EXP_SMOOTHING_WINDOW` ending yesterday:
```python
hist = history_df.copy()
hist['snapshot_date'] = pd.to_datetime(hist['snapshot_date'])
hist = hist.set_index('snapshot_date')['units_sold']
full_index = pd.date_range(
end=pd.Timestamp(date.today() - timedelta(days=1)),
periods=EXP_SMOOTHING_WINDOW, freq='D')
series = hist.reindex(full_index, fill_value=0.0).values.astype(float)
```
Notes: (pid, snapshot_date) is unique in `daily_product_snapshots`, so no duplicate-index risk. `observed_mean` and the `cap` recompute over the full window automatically (intended — the cap gets correspondingly tighter). Mature products are by definition >60 days old, so the 60-day window never predates first receipt. Do NOT use `combine_first` (see gotchas above).
### F3. Stop double-applying the monthly seasonal index
**Where:** `forecast_engine.py`, `generate_all_forecasts()` — the `seasonal_multipliers` pre-compute (~lines 959961) and application (~line 1050).
**Problem:** every per-product calibration (decay velocity, mature Holt level, launch first-week scale, preorder rate, slow-mover velocity) is fitted on *raw recent actuals*, which already embed the current month's seasonal level. The forecast then multiplies by the **absolute** monthly index of the target date. Example from the live indices (`forecast_runs.phase_counts` for run 129): May = 1.224 (sale month), June = 0.982. Early-June forecasts were calibrated on May-sale-inflated velocities and barely discounted — a structural ~25% over-forecast at that transition, and it'll be worse around November (1.316).
**Fix:** apply the seasonal index *relative to the calibration period*. Compute a calibration index as the average monthly index over the trailing 30 calendar days (robust at month boundaries), then divide:
```python
today = date.today()
trailing = [today - timedelta(days=i) for i in range(1, 31)]
calibration_index = float(np.mean([monthly_indices.get(d.month, 1.0) for d in trailing]))
seasonal_multipliers = [
monthly_indices.get(d.month, 1.0) / max(calibration_index, 0.1)
for d in forecast_dates
]
```
Leave the DOW multipliers absolute — every calibration is a multi-week average and therefore DOW-neutral, so reshaping by absolute DOW indices is correct.
**Optional sub-fix (same area, low priority):** the monthly indices are computed from a single trailing 365-day window, so each month appears once and YoY growth contaminates "seasonality". A cheap improvement is widening `SEASONAL_LOOKBACK_DAYS` to 730 and averaging the two observations of each month. Do this only after the main fixes are validated.
### Phase 1 validation
Deploy (edit locally; NFS propagates), run the engine manually once, wait for 35 daily cycles, then:
```sql
-- Portfolio ratio per day (target: drifts from ~2.0 toward 0.81.3)
WITH ranked AS (
SELECT pfh.pid, pfh.forecast_date, pfh.forecast_units, pfh.lifecycle_phase,
ROW_NUMBER() OVER (PARTITION BY pfh.pid, pfh.forecast_date ORDER BY fr.started_at DESC) rn
FROM product_forecasts_history pfh
JOIN forecast_runs fr ON fr.id = pfh.run_id
WHERE pfh.forecast_date >= CURRENT_DATE - 7)
SELECT r.forecast_date, round(SUM(r.forecast_units),0) AS fc,
SUM(COALESCE(dps.units_sold,0)) AS act,
round(SUM(r.forecast_units)/NULLIF(SUM(COALESCE(dps.units_sold,0)),0),2) AS ratio
FROM ranked r
LEFT JOIN daily_product_snapshots dps ON dps.pid = r.pid AND dps.snapshot_date = r.forecast_date
WHERE r.rn = 1 AND r.lifecycle_phase != 'dormant'
GROUP BY 1 ORDER BY 1;
```
Also check `forecast_accuracy` `by_phase` rows for the newest run: decay bias should fall from +0.35 toward ~0, mature from +0.17 toward ~0. (Accuracy lags ~1 day behind each fix since it evaluates yesterday's forecasts.)
---
## Phase 2 — Demand the model currently ignores or mistimes
### F4. Preorder: forecast the preorder rate until arrival, launch curve after
**Where:** `forecast_engine.py``batch_load_product_data()` (add arrival dates), `generate_all_forecasts()` preorder branch (~lines 10051009), and `forecast_from_curve()` (or a small wrapper).
**Problem:** preorder products run the launch curve from `age=0` starting **today**, i.e. full first-week launch sales while the product is still weeks from arriving. Actual preorder-period sales are a much slower trickle.
**Fix:**
1. Batch-load each preorder product's expected arrival from `purchase_orders` (line-item grain: it has `pid` and `expected_date` directly). Open statuses verified against live data: `created`, `ordered`, `electronically_sent`, `receiving_started` (~705 open line items currently have a future `expected_date`):
```sql
SELECT pid, MIN(expected_date) AS expected_arrival
FROM purchase_orders
WHERE pid = ANY(%s)
AND status IN ('created', 'ordered', 'electronically_sent', 'receiving_started')
AND expected_date IS NOT NULL
AND expected_date >= CURRENT_DATE
GROUP BY pid
```
Fallbacks, in order: (a) an open PO with a *past* `expected_date` → assume arrival in 7 days; (b) no PO at all → arrival in 14 days (and log a counter of how many hit this default).
2. In the preorder branch, build the daily array piecewise. Let `days_until_arrival = (expected_arrival - today).days`:
- Days `0 .. days_until_arrival-1`: flat observed preorder daily rate = `preorder_sales[pid] / max(preorder_days[pid], 1)` (both already batch-loaded), clamped to ≤ the curve's scaled week-0 daily value.
- Days `days_until_arrival .. horizon`: `forecast_from_curve(curve_info, scale, age_days=0, ...)` shifted so the curve's day 0 lands on the arrival date (i.e. pass `horizon_days - days_until_arrival` and offset into the output array).
- Keep the existing `compute_scale_factor('preorder', ...)` for the post-arrival curve; the pre-arrival segment doesn't use it.
This is consistent with how the reference curves were built: historical preorder units were recorded on their **order dates** (pre-arrival), so week-0 of the fitted curves reflects post-receipt orders, not the backlog.
### F5. Dormant products: small positive rate instead of hard zero, and count them
**Where:** `forecast_engine.py``generate_all_forecasts()` dormant branch (~lines 10401042), `batch_load_product_data()`, and `compute_accuracy()`.
**Problem:** all ~28K dormant products are forecast at exactly 0, yet they sold 16,180 units in the eval window (~11% of all demand) — restocks, promos, long-tail. Worse, dormant is *excluded* from the headline accuracy filter, so this miss is invisible.
**Fix (cheap version, do this now):**
1. Batch-load a trailing-180-day order rate for dormant products (11,362 of them have ≥1 sale in 180d — verified):
```sql
SELECT o.pid, SUM(o.quantity) / 180.0 AS rate
FROM orders o
WHERE o.pid = ANY(%s)
AND o.canceled IS DISTINCT FROM TRUE
AND o.date >= CURRENT_DATE - INTERVAL '180 days'
GROUP BY o.pid
```
2. Dormant branch: if the product has a rate > 0, forecast it flat with `method = 'velocity'`; else keep zeros with `method = 'zero'`. Apply the same DOW/seasonal multipliers as everything else (automatic — they're applied after the branch).
3. In `compute_accuracy()`, add a second overall row: `metric_type='overall', dimension_value='all_incl_dormant'` with no dormant filter (keep the existing `'all'` row unchanged for trend continuity). One extra entry in the `dimensions`/`filter_clauses` dicts.
**Upgrade path (optional, Phase 4):** replace flat rates for `slow_mover` + dormant-with-sales with TSB (TeunterSyntetosBabai), the standard intermittent-demand method with obsolescence handling. Per product over a daily series `d_t` (build it from snapshots the F2 way — full calendar reindex):
```
if d_t > 0: p_t = p_{t-1} + β·(1 p_{t-1}); z_t = z_{t-1} + α·(d_t z_{t-1})
else: p_t = p_{t-1}·(1 β); z_t = z_{t-1}
forecast = p_T · z_T (flat across horizon)
```
Start with α=0.1, β=0.05, initialize p = (nonzero days / total days), z = mean of nonzero demands. Scope: slow_mover (~6K) + dormant with 180d sales (~11K); series from up to 180 days of snapshots (sparse rows → ~manageable volume). Only do this after Phase 3 measurement exists to prove it beats the flat rates.
### Phase 2 validation
After 35 cycles: preorder `by_phase` bias should drop from +0.85 toward < +0.3; the new `all_incl_dormant` row should appear and its `total_actual_units` minus `'all'`'s should be largely *covered* rather than all-miss (dormant `bias` rising from 1.36 toward ~0.3 or better).
---
## Phase 3 — Fix the measurement (schema + engine + API + UI)
> Without this phase you cannot see whether Phases 12 worked except by ad-hoc SQL, the lead-time chart stays a single bucket forever, and the dashboard keeps displaying a number with a 190% floor in red.
### F7. Archive long-lead forecasts so 15/30/60/90d accuracy exists
**Where:** `forecast_engine.py``archive_forecasts()` (~lines 10861154), `compute_accuracy()` CTE (~lines 12011228).
**Problem:** the current design archives only *past-dated* rows of the previous run before truncation. With daily runs, that's only ever the 1-day-ahead slice — all 879,800 accuracy samples sit in the '1-7d' bucket and the longer buckets in the UI chart can never populate. Purchasing decisions ride on 3060d forecasts that are never validated.
**Fix:**
1. Keep the existing past-date archiving exactly as is (it provides dense short-lead coverage).
2. After `generate_all_forecasts()` completes, additionally archive a **sampled set of future leads** from the new run, non-dormant only, attributed to the *current* run id (correct attribution, unlike the past-date path which attributes to the previous run):
```sql
INSERT INTO product_forecasts_history
(run_id, pid, forecast_date, forecast_units, forecast_revenue,
lifecycle_phase, forecast_method, confidence_lower, confidence_upper, generated_at)
SELECT %(run_id)s, pid, forecast_date, forecast_units, forecast_revenue,
lifecycle_phase, forecast_method, confidence_lower, confidence_upper, generated_at
FROM product_forecasts
WHERE lifecycle_phase != 'dormant'
AND forecast_date - CURRENT_DATE IN (7, 14, 30, 60, 89)
ON CONFLICT (run_id, pid, forecast_date) DO NOTHING
```
Volume: ~10K non-dormant products × 5 leads ≈ 50K rows/day; the existing 90-day prune (`forecast_date < CURRENT_DATE - 90`) bounds steady state at a few million rows. Note future-dated rows survive until their date passes + 90 days — that's intended.
3. **CRITICAL companion change** in `compute_accuracy()`: the accuracy CTE must now exclude not-yet-realized rows, or future-dated archives get scored against actual=0:
```sql
FROM product_forecasts_history pfh
JOIN forecast_runs fr ON fr.id = pfh.run_id
WHERE pfh.forecast_date < CURRENT_DATE -- ADD THIS
```
4. **Dedup semantics change.** Today's `ROW_NUMBER() OVER (PARTITION BY pid, forecast_date ORDER BY started_at DESC)` keeps only the latest (= shortest-lead) row per pid/date, which would silently discard all the new long-lead rows. Restructure:
- Compute `lead_days = forecast_date - started_at::date` and the lead bucket *inside* `ranked_history`.
- For `by_lead_time`: dedup `PARTITION BY pid, forecast_date, lead_bucket` (one sample per pid/date/bucket, latest run wins within a bucket).
- For everything else (`overall`, `by_phase`, `by_method`, `daily`, and the new weekly metric below): restrict to `lead_days BETWEEN 0 AND 6` and keep the existing per-(pid, date) dedup. This preserves the current meaning of the headline metrics (short-lead) while the lead-time table becomes real.
### F8. Track a naive baseline (forecast value-added)
**Where:** `archive_forecasts()` (both INSERT paths), `compute_accuracy()`, `forecast_accuracy` schema, `/forecast/accuracy` endpoint.
**Problem:** the engine currently *loses* to a trailing-average naive forecast (221% vs 204% daily WMAPE) and nothing on the dashboard would ever reveal that. Every accuracy improvement should be judged as value-over-naive.
**Fix:**
1. Schema (idempotent, in the ensure blocks): `ALTER TABLE product_forecasts_history ADD COLUMN IF NOT EXISTS naive_units NUMERIC(10,2);` and `ALTER TABLE forecast_accuracy ADD COLUMN IF NOT EXISTS naive_wmape NUMERIC(10,4), ADD COLUMN IF NOT EXISTS fva NUMERIC(10,4);`
2. Populate `naive_units` during both archive INSERTs via a join — naive = flat trailing-28-day average daily units as of archive time (28 days = DOW-balanced; information available at generation; same value at every lead, which is exactly what a naive baseline means):
```sql
LEFT JOIN (
SELECT o.pid, SUM(o.quantity) / 28.0 AS naive_daily
FROM orders o
WHERE o.canceled IS DISTINCT FROM TRUE
AND o.date >= CURRENT_DATE - INTERVAL '28 days' AND o.date < CURRENT_DATE
GROUP BY o.pid
) nv ON nv.pid = pf.pid
-- select COALESCE(nv.naive_daily, 0) AS naive_units
```
3. In `compute_accuracy()`, add to each dimension's aggregate: `SUM(ABS(naive_units - actual_units)) / NULLIF(SUM(actual_units),0) AS naive_wmape` and store `fva = 1 - wmape / naive_wmape` (NULL-safe). Rows archived before this change have `naive_units` NULL — treat NULL as excluded (`FILTER (WHERE naive_units IS NOT NULL)` on the naive sums) rather than as zero.
4. Endpoint: include `naiveWmape` and `fva` in the `overall` (and per-phase) payload of `/dashboard/forecast/accuracy` in `dashboard.js`.
### F9. Weekly-grain headline metric + bias as a percentage
**Where:** `compute_accuracy()`, `/forecast/accuracy` endpoint, `ForecastAccuracy.tsx`.
**Problem:** daily-grain WMAPE on this catalog has a ~190% floor — as a headline it's noise. The informative numbers are (a) weekly-per-product WMAPE (currently ~109%, target ~7085% post-fix) and (b) aggregate bias, which the UI currently renders as `+0.108 units` — indistinguishable from zero while the reality is +70%.
**Fix:**
1. New metric in `compute_accuracy()`: `metric_type='overall_weekly', dimension_value='all'`. Definition: using the short-lead deduped rows (lead ≤ 6, non-dormant), aggregate per `(pid, date_trunc('week', forecast_date))` keeping only complete weeks (`COUNT(*) = 7`), then `WMAPE = SUM(ABS(fc_week act_week)) / SUM(act_week)`, excluding pid-weeks where both are 0. Store sample_size = number of pid-weeks. Compute `naive_wmape`/`fva` the same way from `naive_units`.
2. Endpoint: expose as `overallWeekly`; also add a weekly variant to the `accuracyTrend` query (`metric_type='overall_weekly'`). The trend will start empty (old runs lack the row) — that's fine; don't backfill.
3. `ForecastAccuracy.tsx`:
- Headline WMAPE → `overallWeekly.wmape`, labeled "WMAPE (weekly)". Keep daily WMAPE available in a tooltip if desired.
- Color thresholds for weekly grain: green ≤ 60, yellow ≤ 90, red above (tunable; document that they're calibrated for intermittent retail demand).
- Replace the bias row: show `(totalForecast / totalActual 1)` as a signed percentage labeled "Forecast vs actual" (both totals already arrive in `overall`). Keep MAE.
- Add a "vs naive" line: naive weekly WMAPE and FVA. FVA > 0 = engine adds value.
- The lead-time chart needs no code change — buckets will populate as F7 rows mature (7d lead evaluable after 7 days, 30d after 30, etc.).
4. `confidenceLevel` in `/forecast/metrics` ([dashboard.js ~line 360]) is "share of products forecast via lifecycle curves", not confidence. It only feeds a per-day tooltip field — rename the JSON field to `curveCoverage` and update the one consumer in `ForecastMetrics.tsx`, or leave it and add a comment; low priority.
### Phase 3 validation
- Next run after deploy: `forecast_accuracy` contains `overall_weekly` and `fva` values; `/dashboard/forecast/accuracy` returns them; the overview popover renders weekly WMAPE, bias %, and the naive comparison.
- After 7/14/30 days: `by_lead_time` rows appear for '8-14d', '15-30d', '31-60d' buckets respectively (61-90d after ~60 days).
- Confirm engine runtime still < ~5 min and `product_forecasts_history` growth ≈ 5070K rows/day.
---
## Phase 4 — Optional / after the above is proven
- **F6. TSB for slow movers + dormant** (spec in F5). Gate on Phase 3 measurement: ship only if weekly FVA improves on those phases.
- **F10. Confidence-margin source:** `load_accuracy_margins()` feeds daily-grain per-phase WMAPE (clamped to 1.0) into the intervals, so every interval is ±100% — uninformative. Once `overall_weekly` exists, add per-phase weekly rows (`by_phase_weekly`) and source margins from those instead.
- **F11.** Update or delete `backfill_accuracy_data()` (it encodes the old formulas). Until then, just don't run `--backfill`.
- **F12.** `compute_dow_indices()` weights by revenue but the multipliers are applied to units — switch `SUM(o.price * o.quantity)` to `SUM(o.quantity)`. Tiny effect.
- **F13.** Longer term: for reorder decisions the right target is P(lead-time demand > stock), not a point forecast. Evaluate quantile (pinball) loss at lead-time horizons using the existing confidence-interval columns. Design separately.
---
## 4. Success criteria
1. Rolling-14-day portfolio forecast/actual ratio within **0.81.25** (currently 1.52.5).
2. Weekly-grain WMAPE ≤ **90%** and **FVA > 0** (engine beats naive) sustained for 2+ weeks.
3. Decay/preorder/mature per-phase bias within ±0.1 units/day (currently +0.35 / +0.85 / +0.17).
4. `all_incl_dormant` actuals covered: dormant bias better than 0.4 (currently 1.36, i.e. 100% miss).
5. Lead-time buckets through 3160d populated with ≥10K samples each within ~6 weeks.
6. Launch phase stays healthy (bias within ±0.15, WMAPE not degraded) — regression guard for F3/F4 changes.
## 5. Re-measurement appendix
The naive-vs-engine comparison used in the diagnosis (rerun any time; adjust dates):
```sql
WITH ranked AS (
SELECT pfh.pid, pfh.forecast_date, pfh.forecast_units, pfh.lifecycle_phase,
ROW_NUMBER() OVER (PARTITION BY pfh.pid, pfh.forecast_date ORDER BY fr.started_at DESC) rn
FROM product_forecasts_history pfh
JOIN forecast_runs fr ON fr.id = pfh.run_id
WHERE pfh.forecast_date BETWEEN CURRENT_DATE - 9 AND CURRENT_DATE - 1),
eng AS (SELECT * FROM ranked WHERE rn = 1 AND lifecycle_phase != 'dormant'),
naive AS (
SELECT o.pid, SUM(o.quantity)/30.0 AS naive_daily FROM orders o
WHERE o.canceled IS DISTINCT FROM TRUE
AND o.date >= CURRENT_DATE - 39 AND o.date < CURRENT_DATE - 9
GROUP BY o.pid)
SELECT e.lifecycle_phase, COUNT(*) AS n, SUM(COALESCE(dps.units_sold,0)) AS actual,
round(SUM(e.forecast_units),0) AS engine_fc, round(SUM(COALESCE(nv.naive_daily,0)),0) AS naive_fc,
round(SUM(ABS(e.forecast_units - COALESCE(dps.units_sold,0)))/NULLIF(SUM(COALESCE(dps.units_sold,0)),0),2) AS engine_wmape,
round(SUM(ABS(COALESCE(nv.naive_daily,0) - COALESCE(dps.units_sold,0)))/NULLIF(SUM(COALESCE(dps.units_sold,0)),0),2) AS naive_wmape
FROM eng e
LEFT JOIN naive nv ON nv.pid = e.pid
LEFT JOIN daily_product_snapshots dps ON dps.pid = e.pid AND dps.snapshot_date = e.forecast_date
GROUP BY ROLLUP(e.lifecycle_phase) ORDER BY 1;
```
Baseline numbers to beat (June 19, 2026): engine 221% / naive 204% daily WMAPE; engine_fc/actual = 1.82; per-phase table in §1.
+449
View File
@@ -0,0 +1,449 @@
# Import & Metrics Pipeline Fix Plan
Fixes for issues found in a full review (2026-06-10) of the `full-update.js` pipeline:
`inventory-server/scripts/full-update.js``import-from-prod.js` (6 importers in `scripts/import/`)
`calculate-metrics-new.js` (7 SQL modules in `scripts/metrics-new/`).
Every issue below was verified against the code, and where marked **[verified-live]**, against the
live MySQL source (`sg` on 192.168.1.5 via the acot-db tooling / `ssh workpi`) and live PostgreSQL
(`inventory_db``ssh netcup`, then `psql -U inventory_readonly`, password in `/Users/matt/Dev/inventory/CLAUDE.md`).
Write credentials for migrations: see `/var/www/inventory/.env` on netcup (`inventory_user`).
## Operational context (read first)
- Local `inventory-server/` is **NFS-mounted** to `/var/www/inventory/` on the netcup server — edits
appear on the server with no copy step. Run heavy validation/grep/find **on the server via
`ssh netcup`**, not locally (NFS hangs + AppleDouble `._*` noise).
- The PG server timezone is **Europe/Berlin**. The business operates in **America/Chicago**. This
matters for Fix 2.
- MySQL server is America/Chicago; the mysql2 driver is configured `timezone: '-05:00'` and
corrected at runtime by `adjustDateForMySQL()` in `scripts/import/utils.js` (see
`memory/TIMEZONE_ISSUE.md`). Don't "fix" that part — it already works.
- Orders/PO/products imports are incremental by default (`INCREMENTAL_UPDATE !== 'false'`); a full
orders sync = run with `INCREMENTAL_UPDATE=false` (5-year window).
- Existing rebuild tooling: `scripts/metrics-new/backfill/rebuild_daily_snapshots.sql` (rebuilds
`daily_product_snapshots` from `orders`/`receivings`). The full-pipeline order after data fixes:
re-import → rebuild snapshots → `node scripts/calculate-metrics-new.js`.
- Precedent: `scripts/metrics-new/migrations/002_fix_discount_double_counting.sql` documents the
procedure used last time a discount formula changed. Follow the same pattern (migration doc +
code fix + full re-import + rebuild).
---
## P0 — Data correctness (do both, then ONE re-import + rebuild)
### Fix 1: Item-level promo discounts dropped (~$26K / 30 days ≈ 10% of product revenue) [verified-live]
**File:** `scripts/import/orders.js``order_totals` CTE (~lines 604-623) and the discount fetch in
`processDiscountsBatch` (~lines 379-383).
**Problem.** The discount applied to each PG `orders` row is:
prorated `summary_discount_subtotal` + item-level promo discounts. The item-level part is gated:
```sql
SUM(CASE WHEN COALESCE(md.discount_amount_subtotal, 0) > 0 THEN id.amount ELSE 0 END)
```
In the PHP source (`/Users/matt/Dev/acot/website/website/lib/neworder.class.php`):
- `order_items.prod_price` is the **pre-promo** price; `summary_subtotal = Σ prod_price·qty` (line ~3087).
- Item-level promo discounts live in `order_discount_items` with `which = 2`; they are applied to the
order total via `summary_discount += amount + products_disc_sum` (line ~6567) — i.e. they are **not**
part of `discount_amount_subtotal` and **not** baked into `prod_price`.
- Live data (90 days): of 10,010 type-10 promo discounts, **8,070 have item rows but only 8 have
`discount_amount_subtotal > 0`** — the gate zeroes essentially all item-level promo discounts.
- Live impact (30 days): **$25,989 dropped** across 2,021 orders, vs only $13,574 captured via the
prorated subtotal component. Order discount components, 30d: total $54,957 = $13,574 subtotal +
$15,395 shipping + ~$25,989 item-level. (Shipping discounts correctly excluded from product revenue.)
**Consequence.** `orders.discount` understated → `net_revenue`, `profit_30d`, `margin_30d` overstated
by ~10% of revenue; `discounts_30d` / `discount_rate_30d` ~3x understated. Flows into daily snapshots,
product/brand/vendor/category metrics, and dashboards.
**Fix.**
1. In `processDiscountsBatch`, fetch only real item discounts:
`SELECT order_id, pid, discount_id, amount FROM order_discount_items WHERE order_id IN (?) AND which = 2`.
(`which=1` rows store prices of free promo-added items; `which=3` are usage records — neither is a
discount amount.)
2. In the `order_totals` CTE, remove the gate — sum `id.amount` unconditionally:
`SUM(COALESCE(id.amount, 0)) AS promo_discount_sum` (drop the join/CASE on `temp_main_discounts`;
`temp_main_discounts` becomes unused and can be removed entirely along with its insert loop).
3. Sanity guard (optional, recommended): clamp final per-row discount to `price * quantity`.
**Verification.** After a FULL orders re-import, for a recent 30-day window PG should satisfy:
`SUM(discount)` ≈ MySQL `Σ summary_discount_subtotal` + `Σ order_discount_items.amount (which=2)`
over the same orders (± rounding from proration). Spot-check an order with a type-10 promo:
discount on the affected pid ≈ the `which=2` amount. Re-run migration 002's verification query too
(pids 624756, 614513) to confirm no regression of the prior fix.
### Fix 2: Daily snapshots bucket sales by Europe/Berlin days, not business days [verified-live]
**Files:** `scripts/metrics-new/update_daily_snapshots.sql` (SalesData join `o.date::date = _target_date`
~line 138; gap-fill and stale-detection aggregates at lines ~47-83);
`scripts/metrics-new/backfill/rebuild_daily_snapshots.sql` (same pattern — check & fix);
`scripts/metrics-new/update_product_metrics.sql` (`HistoricalDates` `MIN(o.date)::date` etc., lines ~131-147).
**Problem.** `orders.date` is `timestamptz`; `::date` casts in the server TZ (**Europe/Berlin**,
verified via `SHOW timezone`). Berlin is 7-8h ahead of Central, so every order placed after
~5 PM Central lands on the **next** snapshot day. This shifts a large evening slice of daily sales
forward one day; skews `yesterday_sales`, day-of-week patterns (the forecast engine's DOW
multipliers, daily-grain forecast accuracy — see `FORECAST_FIX_PLAN.md`), and is inconsistent with
`stock_snapshots`, whose dates come from a Central-time MySQL cron.
**Fix.** Bucket all order/receiving dates in business time. Replace every `o.date::date` /
`received_date::date` used for *day bucketing* in the two snapshot SQL files with:
```sql
(o.date AT TIME ZONE 'America/Chicago')::date
```
Apply consistently in: SalesData, ReceivingData, the gap-fill date lists, the stale-detection
aggregates (they must match SalesData or every day looks permanently stale), and the rebuild script.
`HistoricalDates` in update_product_metrics (first/last sold dates) should match too.
Add an index to keep the per-day loop fast, e.g.
`CREATE INDEX ON orders ( ((date AT TIME ZONE 'America/Chicago')::date) );` and equivalent on
`receivings(received_date)`; check `EXPLAIN` on the SalesData query afterward.
Note: `receivings.received_date` came from MySQL DATETIME (Central literal) inserted as timestamptz —
it was interpreted in the *session* TZ at insert. Before converting, spot-check a few receivings
against MySQL to confirm which TZ the stored instants actually represent; the conversion expression
must yield the Central calendar day MySQL shows. Same check for `orders.date` (it originates from
`_order.date_placed`, a TIMESTAMP column, so it should be a correct instant — `AT TIME ZONE
'America/Chicago'` is right for it).
**Verification.** Pick 2-3 recent days; compare per-day `units_sold` totals in
`daily_product_snapshots` against MySQL
`SELECT date_placed_onlydate, SUM(qty_ordered) ... WHERE order_status >= 20 GROUP BY 1`
(MySQL stores Central days). They should now match closely (small diffs from canceled-status timing).
### P0 execution order (single pass)
1. Land Fix 1 (orders.js) and Fix 2 (both snapshot SQL files + product-metrics date CTE).
2. Full orders re-import: `INCREMENTAL_UPDATE=false node scripts/import-from-prod.js` (or at minimum
the orders step) — run on the server, it's long.
3. Rebuild snapshots: `psql -f scripts/metrics-new/backfill/rebuild_daily_snapshots.sql` (after
confirming it contains the TZ fix). The hourly job's 90-day self-heal will NOT fix history beyond
90 days by itself; the explicit rebuild is required.
4. `node scripts/calculate-metrics-new.js`.
5. Expect dashboards to show: margins down ~8-10 points (real), daily sales curves shifted, DOW
profile changed. Tell the user before/after numbers.
---
## P1 — Wrong or drifting numbers, fix soon
### Fix 3: Vendor avg lead time computed over a near-cartesian join
**File:** `scripts/metrics-new/calculate_vendor_metrics.sql`, `VendorPOAggregates` (lines ~62-83).
**Problem.** Joins each done-PO line to **every** receiving of the same (pid, supplier) after the PO
date — a product received 10 times contributes 10 ever-growing lead times → overstated, busy-product-
weighted vendor lead time. The per-product version in `update_periodic_metrics.sql` (lines 27-48)
is correct (MIN receiving per PO within 180 days, then average).
**Fix.** Reuse the periodic shape, aggregated to vendor:
```sql
WITH po_first_receiving AS (
SELECT po.vendor, po.po_id, po.pid, po.date::date AS po_date,
MIN(r.received_date::date) AS first_receive_date
FROM purchase_orders po
JOIN receivings r ON r.pid = po.pid AND r.supplier_id = po.supplier_id
AND r.received_date >= po.date
AND r.received_date <= po.date + INTERVAL '180 days'
WHERE po.status = 'done' AND po.date >= CURRENT_DATE - INTERVAL '1 year'
AND po.vendor IS NOT NULL AND po.vendor <> ''
GROUP BY po.vendor, po.po_id, po.pid, po.date
)
SELECT vendor, COUNT(DISTINCT po_id) AS po_count_365d,
ROUND(AVG(GREATEST(1, first_receive_date - po_date)))::int AS avg_lead_time_days_hist
FROM po_first_receiving GROUP BY vendor
```
**Verification.** For a few vendors compare old vs new values; new should be materially lower and
roughly match `AVG(product_metrics.avg_lead_time_days)` for that vendor's products.
### Fix 4: Deleted order items & combined orders never reconciled in PG [verified-live]
**File:** `scripts/import/orders.js`.
**Problem.** The orders import upserts but never deletes:
- Items removed from an order in MySQL (`DELETE FROM order_items ...` happens, e.g.
neworder.class.php ~line 6500 for unpicked promo items, plus staff edits) leave stale rows in PG
forever. May 2026 check: PG has 49,841 item rows vs MySQL 49,377 (+0.9%) — and PG should be ≤
MySQL.
- Combining orders (`combine_orders`, neworder.class.php ~11946) sets the source orders to status 16
AND **zeroes `date_placed`**, then copies all items to a NEW order. Because the import query
filters `o.date_placed >= …`, a combined source order can never be re-fetched, so its stale
'placed' rows would double-count with the new merged order. Currently latent (last combine
2024-07, predating current PG data — verified no stale rows exist today), but it will silently
corrupt the day combining is used again.
**Fix.** Two parts, both inside the orders import after the upsert phase:
1. **Item-set reconciliation** for re-imported orders: the import already knows the set of changed
`orderIds` and inserted their current items into `temp_order_items`. Mirror the PO import's
pattern (`purchase-orders.js` lines ~683-694):
```sql
DELETE FROM orders o
WHERE o.order_number = ANY($1) -- orders fetched this run
AND NOT EXISTS (SELECT 1 FROM temp_order_items t
WHERE t.order_id = o.order_number AND t.pid = o.pid);
```
2. **Combined/cancelled sweep** that does NOT depend on `date_placed`: each run, fetch from MySQL
`SELECT order_id, order_status FROM _order WHERE order_status IN (15,16) AND stamp > ?`
(no date_placed filter) and update matching PG rows' `status`/`canceled`
('combined' rows are then excluded from metrics — see Fix 5). Cheap (small result set).
**Verification.** Re-run the May-2026 row-count comparison (MySQL vs PG for one month) after one full
run; counts should converge (PG ≤ MySQL, diff explained by TZ window edges only).
### Fix 5: 'combined' orders are counted as sales
**Files:** `scripts/metrics-new/update_daily_snapshots.sql` (status filters, lines ~77, 120-134),
`update_product_metrics.sql` (`HistoricalDates` line ~145, `LifetimeRevenue` line ~249),
`backfill/rebuild_daily_snapshots.sql`.
**Problem.** Sales filters exclude only `('canceled', 'returned')`. Status 16 'combined' = "merged
into another order" — the new order carries the same items, so counting both double-counts. 826
combined orders exist in MySQL; today none are in PG (see Fix 4), but once Fix 4's sweep starts
marking rows 'combined', the metrics filters must exclude them.
**Fix.** Change every `NOT IN ('canceled', 'returned')` in the metrics SQL to
`NOT IN ('canceled', 'returned', 'combined')`. Grep for the pattern in `scripts/metrics-new/` and
`src/routes/` (dashboard endpoints replicate these filters — see CLAUDE.md analytics-filters note).
### Fix 6: Incremental sync watermark race (silent permanent misses)
**Files:** `scripts/import/orders.js` (~772), `products.js` (~934), `purchase-orders.js` (~833).
**Problem.** `sync_status.last_sync_timestamp` is set to `NOW()` *after* the import finishes. Any
MySQL row modified between the source query and that write is below the new watermark but was never
fetched → permanently skipped (until a full sync or the row changes again). Long imports widen the
window; PG/MySQL clock skew adds to it.
**Fix.** Capture the watermark **before** the source query and write that value:
```js
const [[{ now: sourceNow }]] = await prodConnection.query('SELECT NOW() as now');
// ... do the import ...
await localConnection.query(
`INSERT INTO sync_status ... VALUES ('orders', $1) ON CONFLICT ... SET last_sync_timestamp = $1`,
[sourceNow]);
```
Using MySQL's own clock also eliminates cross-server skew. Note `sourceNow` comes back through the
mysql2 driver TZ conversion — verify round-tripping with `adjustDateForMySQL` produces a correct
comparison value, or store `UTC_TIMESTAMP()` and compare against `CONVERT_TZ`-normalized stamps.
Overlap (re-importing rows changed during the run) is harmless — everything is upserted.
### Fix 7: Stockout days / service level / fill rate / avg stock built on activity-only snapshots
**Files:** `scripts/metrics-new/update_product_metrics.sql` — `SnapshotAggregates`
(`stockout_days_30d`, `avg_stock_*_30d`, lines ~177-189), `ServiceLevels` (lines ~304-323),
plus `calculate_sales_velocity` usage.
**Problem.** `daily_product_snapshots` only has rows on days with sales/receivings. So:
- A product that is out of stock (and therefore sells nothing) gets **no row** → `stockout_days_30d`
≈ 0 exactly when stockouts matter → `calculate_sales_velocity(sales, stockout_days)`'s adjustment
is inert → velocity and replenishment understated for constrained products.
- `service_level_30d` divides stockout days by COUNT(activity days), not 30.
- `avg_stock_units_30d` / `avg_stock_cost_30d` average only activity days (biased toward in-stock
days) → GMROI / stockturn / sell-through denominators biased.
- `fill_rate_30d`'s `units_sold * 0.2` lost-sales heuristic is arbitrary — fine to keep, but document.
**Fix.** Derive stock-presence metrics from `stock_snapshots` (full daily coverage from MySQL
`snap_product_value`, imported by `stock-snapshots.js`) instead of `daily_product_snapshots`:
```sql
StockCoverage AS (
SELECT pid,
COUNT(*) FILTER (WHERE stock_quantity <= 0) AS stockout_days_30d,
AVG(stock_quantity) AS avg_stock_units_30d,
AVG(stock_value) AS avg_stock_cost_30d
FROM stock_snapshots
WHERE snapshot_date >= _current_date - INTERVAL '29 days'
GROUP BY pid
)
```
Treat products absent from `stock_snapshots` for a day as unknown (NULL), not in-stock. Keep
`daily_product_snapshots` for sales/revenue aggregates. `service_level_30d` denominator becomes the
count of covered days. Note `stock_snapshots` has no `eod_stock_retail`; keep retail/gross averages
on the old source or compute as `stock_quantity * current price` explicitly.
**Verification.** Pick products that had a known stockout period; `stockout_days_30d` should now be
> 0 and `sales_velocity_daily` should rise accordingly.
---
## P2 — Definition / robustness improvements
### Fix 8: Returns don't reduce COGS; LifetimeRevenue ignores returns
`update_daily_snapshots.sql` SalesData: COGS accrues only on `quantity > 0` rows; return rows
(negative qty — 15,875 rows live) subtract revenue but never COGS → margin understated in
return-heavy periods. Add a returns-COGS term mirroring the sales-COGS COALESCE chain
(`SUM(... WHEN quantity < 0 THEN cost * ABS(quantity))`) and subtract it in `cogs` (or store
`returns_cogs` separately and use `cogs - returns_cogs` in profit). Also `LifetimeRevenue` in
`update_product_metrics.sql` (line ~242) filters `quantity > 0` — include negative-qty rows so
lifetime revenue nets out returns (drop the quantity filter; `price*quantity` is already signed,
but check the `- discount` term sign for return rows).
### Fix 9: return_rate_30d definition
`update_product_metrics.sql` line ~468: `returns / (sales + returns)` → industry standard is
`returns / sales`. Change denominator to `NULLIF(sa.sales_30d, 0)`.
### Fix 10: GMROI not annualized
Line ~466: `profit_30d / avg_stock_cost_30d` is a monthly GMROI (~1/12 of the conventional annual
figure, benchmark ≥ 2-3). Either annualize (`* 12.17`) or rename the column/label "monthly".
Decision for Matt; annualizing is recommended for comparability. Frontend displays must be checked
either way.
### Fix 11: get_weighted_avg_cost is a lifetime WAC
`db/functions.sql` (~line 81, deployed identically): averages ALL receivings ≤ date — decade-old
costs weigh equally. Recommended: window to recent receivings, e.g. last 365 days falling back to
lifetime when none. Used as fallback COGS when `o.costeach` is NULL, so impact is modest but real
for long-lived SKUs. Apply with `CREATE OR REPLACE FUNCTION` in `db/functions.sql` AND on the live DB.
### Fix 12: exclude_from_forecast removes products from product_metrics entirely
`update_product_metrics.sql` line ~627 (`WHERE s.exclude_forecast IS FALSE OR ... IS NULL`): the
flag's name implies forecast-only, but excluded products get NO metrics row → vanish from brand/
vendor/category rollups and dashboards. Fix: always emit the row; instead NULL the
forecast/replenishment columns when excluded (wrap those expressions in
`CASE WHEN s.exclude_forecast THEN NULL ELSE ... END`).
### Fix 13: Incremental products import misses category-only changes
`products.js` incremental WHERE (~lines 433-440) keys on `p.stamp`, `ci.stamp`, price/b2b dates —
`product_category_index` changes don't bump any of those → PG `product_categories` goes stale. Also
the `needs_update` comparison (~lines 604-625) doesn't compare `categories`, so even refetched rows
skip the category rewrite. Fix both: add `t.categories IS NOT DISTINCT FROM p.categories` to the
needs_update comparison (note: `products.categories` is the GROUP_CONCAT string — confirm PG column
holds the same representation), and add a cheap full-sweep (e.g. weekly, or compare
`COUNT(*) GROUP BY pid` hashes) OR include `EXISTS (SELECT 1 FROM product_category_index pci WHERE
pci.pid = p.pid AND pci.stamp > ?)` in the incremental WHERE if that table has a stamp column —
verify schema first (`DESCRIBE product_category_index`).
### Fix 14: PO/receivings OFFSET pagination over a moving filter
`purchase-orders.js` (~lines 275-298, 447-470): `LIMIT/OFFSET` with a `date_updated > ?` predicate;
concurrent updates shift rows between pages → silent skips. Fix: keyset pagination —
`WHERE ... AND p.po_id > ? ORDER BY p.po_id LIMIT 500`, carrying the last seen po_id (drop OFFSET).
Same for receivings on `receiving_id`.
### Fix 15: Status map gaps and unsafe defaults
- `orders.js` orderStatusMap lacks 45 (`payment_pending`) and 67 (`remote_send`) → imported as
numeric strings. Add both (mirror in `migrations/001_map_order_statuses.sql` as a follow-up update
for existing rows).
- `purchase-orders.js` `poStatusMap[po.status] || 'created'` (line ~335): an unknown *cancel-like*
code would be treated as an open PO and inflate on-order FIFO. Default to a sentinel like
`'unknown_<code>'` instead, and make the FIFO/on-order CTEs in `update_product_metrics.sql` treat
only the known-open statuses as open (they already whitelist open statuses — so the sentinel is
safe there; just ensure nothing treats unknown as 'created'). Same for receivingStatusMap.
### Fix 16: Transactions issued through the pool wrapper land on arbitrary connections
`categories.js` (lines ~17-152) and `daily-deals.js` (~27-130) call `query('BEGIN')` /
`query('COMMIT')` on the wrapper, which checks out a client per call — BEGIN/work/COMMIT are not
guaranteed to share a connection (works only by pool-LIFO accident). The categories
`DISABLE TRIGGER` rides on this too. Fix: use the wrapper's `beginTransaction()/commit()/rollback()`
(see `utils.js` lines 121-148) exactly as orders.js does. In categories.js also move the
post-COMMIT `ENABLE TRIGGER` inside the transaction (DISABLE/ENABLE both inside), or drop the
trigger toggling entirely if the trigger isn't actually problematic anymore.
### Fix 17: stock-snapshots import swallows batch errors → permanent holes
`stock-snapshots.js` (~lines 153-155): a failed batch is logged and skipped, but the next
incremental starts at `MAX(snapshot_date)` — the hole is never revisited. Fix: rethrow (fail the
step) or collect failed date ranges and retry once, then fail if still failing. Also line ~168:
`calculateRate(processedRows, startTime)` — arguments reversed (signature is
`calculateRate(startTime, current)`, see `metrics-new/utils/progress.js:70`).
### Fix 18: Metrics cancellation targets an application_name that's never set
`calculate-metrics-new.js` line ~180 cancels backends `WHERE application_name =
'node-metrics-calculator'`, but the Pool config never sets it → cancellation no-ops (the 30-min
`statement_timeout` is the only real guard). Fix: add `application_name: 'node-metrics-calculator'`
to both dbConfig branches.
### Fix 19: Aggregate-table change-detection lists miss cost-only changes
`calculate_brand_metrics.sql` / `calculate_vendor_metrics.sql` / `calculate_category_metrics.sql`
ON CONFLICT WHERE lists don't include `profit_30d`/`cogs_30d` — a cost revision with unchanged
sales/revenue leaves stale rows (product_metrics has a 1-day staleness net; rollups don't). Add
`... OR x.profit_30d IS DISTINCT FROM EXCLUDED.profit_30d OR x.cogs_30d IS DISTINCT FROM
EXCLUDED.cogs_30d` to each, or add a `last_calculated < NOW() - INTERVAL '1 day'` net like
product_metrics line ~707.
### Fix 20: Snapshot stale-detection only compares unit counts
`update_daily_snapshots.sql` lines ~57-85: detects mismatches in `units_sold`/`units_received` only;
price/discount/costeach corrections older than the 2-day recheck are never repaired. Add a
revenue comparison to the stale check: compare `SUM(net_revenue)` per day against the equivalent
recomputed from `orders` (ROUND both to 2dp to avoid float-noise churn).
### Fix 21: Category metrics positive-only revenue asymmetry
`calculate_category_metrics.sql` (lines ~27-36, 64-73): revenue summed only when `> 0` while
cogs/profit use COALESCE-all → margin numerator/denominator from different populations, and
inconsistent with brand/vendor (plain COALESCE). Change the revenue/sales CASEs to
`COALESCE(pm.revenue_7d, 0)` etc., matching brand_metrics.
### Fix 22 (decision needed): Demand-pattern & seasonality definitions
- `classify_demand_pattern` (db/functions.sql): CV thresholds 0.2/0.5 + avg<1/day. Industry standard
is Syntetos-Boylan: ADI ≥ 1.32 and CV² ≥ 0.49 quadrants (smooth/erratic/intermittent/lumpy).
Today everything classifies sporadic/lumpy. If adopting SB: ADI = 30 / COUNT(days with sales),
CV² computed on nonzero-demand sizes. Changes the vocabulary consumed by the forecast engine
(`scripts/forecast/forecast_engine.py` reads `demand_pattern`) — coordinate before changing.
- SeasonalityAnalysis (`update_product_metrics.sql` ~360): `month_avg = AVG(units_sold)` over rows
with sales only → intensity, not volume. Use monthly totals (SUM, with zero months counted) /
overall monthly average for the index.
- Safety stock: currently static config units; `sales_std_dev_30d` exists but is unused. Optional
upgrade: `safety = z * σ_d * sqrt(lead_time)` with z from a service-level setting.
These change user-facing semantics — confirm with Matt before implementing.
---
## Verified non-issues (no action, or cleanup only)
- **`costeach` fallback `price * 0.5`** (orders.js line ~615): fires on **2.1%** of item rows
(729/34,833, last 30d, live-verified). Accepted by Matt — 50% margin is a fair estimate for these
products. Optional: nothing.
- **Missing-product order skips**: zero occurrences — MySQL has no orphan order_items (1-year check),
PG products is a superset of MySQL products (687,579 vs 687,576), last 7 import runs all logged
`totalSkipped: 0`. Cleanup only: remove the unused `importMissingProducts` import line at
`orders.js:2` (the function itself stays in products.js — harmless utility).
- **Status 30 'cancelled_old'** in `total_sold >= 20` filter: zero rows live in `_order` — safe.
- **Duplicate (order_id, pid) order items**: none exist in MySQL — the upsert PK is safe.
- **base_discount** in orders.js: computed/stored in temp table but unused since migration 002 —
remove the column from temp table + queries for clarity (no behavior change).
- **`full-update.js` `runScript`**: try/catch around `console.log` is dead code; per-step
`status:'complete'` messages could confuse a UI parser. Cosmetic only — tidy if touching the file.
## Suggested implementation order
| Step | Fixes | Re-import/rebuild needed |
|---|---|---|
| 1 | Fix 1 + Fix 2 (+ Fix 5 filters, Fix 8/9 while editing the same SQL) | FULL orders re-import → snapshot rebuild → metrics (once) |
| 2 | Fix 4 + Fix 6 (orders.js reconciliation + watermarks; POs/products watermarks too) | no |
| 3 | Fix 3, Fix 7 (metrics SQL only) | metrics run |
| 4 | Fix 13-21 (robustness batch) | no |
| 5 | Fix 10-12, Fix 22 after Matt's sign-off (definition changes) | metrics run |
After step 1, expect: margin_30d down ~8-10 points, discounts_30d ~3x up, daily curves shifted to
correct business days. Communicate before/after so the change isn't mistaken for a data incident.
## Reference: verification snippets used in the review
```sql
-- MySQL: item-level discounts dropped by the gate (30d)
SELECT COUNT(DISTINCT o.order_id), ROUND(SUM(odi.amount),2)
FROM order_discount_items odi
JOIN order_discounts od ON od.order_id=odi.order_id AND od.discount_id=odi.discount_id
JOIN _order o ON o.order_id=odi.order_id
WHERE odi.which=2 AND o.date_placed >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
AND o.order_status >= 20 AND COALESCE(od.discount_amount_subtotal,0)=0;
-- → 2,021 orders / $25,989 (2026-06-10)
-- MySQL: costeach fallback frequency (30d)
SELECT COUNT(*),
SUM(CASE WHEN NOT EXISTS (SELECT 1 FROM order_costs oc WHERE oc.orderid=oi.order_id
AND oc.pid=oi.prod_pid AND oc.pending=0)
AND NOT EXISTS (SELECT 1 FROM product_inventory pi WHERE pi.pid=oi.prod_pid)
THEN 1 ELSE 0 END)
FROM order_items oi JOIN _order o ON o.order_id=oi.order_id
WHERE o.order_status >= 20 AND o.date_placed >= DATE_SUB(CURDATE(), INTERVAL 30 DAY);
-- → 729 / 34,833 = 2.1% (2026-06-10)
-- PG: timezone check
SHOW timezone; -- Europe/Berlin (2026-06-10)
-- Row drift, May 2026: MySQL 49,377 items / PG 49,841 (+0.9%)
```
+16 -1
View File
@@ -76,7 +76,9 @@ $function$;
-- =============================================================================
-- get_weighted_avg_cost: Weighted average cost from receivings up to a given date.
-- Uses all non-canceled receivings (no row limit) weighted by quantity.
-- Prefers receivings from the 365 days before p_date so decade-old costs don't
-- weigh equally with recent ones; falls back to the lifetime average when the
-- product had no receivings in that window.
-- =============================================================================
CREATE OR REPLACE FUNCTION public.get_weighted_avg_cost(
p_pid bigint,
@@ -97,8 +99,21 @@ BEGIN
FROM receivings
WHERE pid = p_pid
AND received_date <= p_date
AND received_date > p_date - INTERVAL '365 days'
AND status != 'canceled';
IF weighted_cost IS NULL THEN
SELECT
CASE
WHEN SUM(qty_each) > 0 THEN SUM(cost_each * qty_each) / SUM(qty_each)
ELSE NULL
END INTO weighted_cost
FROM receivings
WHERE pid = p_pid
AND received_date <= p_date
AND status != 'canceled';
END IF;
RETURN weighted_cost;
END;
$function$;
@@ -76,6 +76,8 @@ if (process.env.DATABASE_URL && typeof process.env.DATABASE_URL === 'string') {
dbConfig = {
connectionString: process.env.DATABASE_URL,
ssl: process.env.DB_SSL === 'true' ? { rejectUnauthorized: false } : false,
// Required by cancelCalculation(): pg_cancel_backend targets this name
application_name: 'node-metrics-calculator',
// Add performance optimizations
max: 10, // connection pool max size
idleTimeoutMillis: 30000,
@@ -93,6 +95,8 @@ if (process.env.DATABASE_URL && typeof process.env.DATABASE_URL === 'string') {
database: process.env.DB_NAME,
port: process.env.DB_PORT || 5432,
ssl: process.env.DB_SSL === 'true',
// Required by cancelCalculation(): pg_cancel_backend targets this name
application_name: 'node-metrics-calculator',
// Add performance optimizations
max: 10, // connection pool max size
idleTimeoutMillis: 30000,
@@ -634,6 +634,52 @@ def forecast_from_curve(curve_params, scale_factor, age_days, horizon_days):
return np.array(forecasts)
def forecast_preorder(curve_params, scale_factor, days_until_arrival,
preorder_daily_rate, horizon_days):
"""
Piecewise pre-order forecast: a flat observed pre-order trickle until the
product is expected to arrive, then the scaled launch curve from age 0.
The launch curve was fit on POST-receipt order history, so running it from
today (while the product is still weeks from arriving) front-loads full
first-week launch volume that hasn't happened yet — the main driver of the
~2.15x preorder over-forecast. Instead we forecast the slow pre-order rate
up to the arrival date, then start the curve's day 0 on that date.
See FORECAST_FIX_PLAN F4.
Args:
curve_params: (amplitude, decay_rate, baseline, ...) weekly curve
scale_factor: per-product multiplier for the post-arrival curve envelope
days_until_arrival: calendar days from today until expected arrival
preorder_daily_rate: observed pre-order units/day (trickle)
horizon_days: forecast horizon length
Returns:
array of daily forecast values of length horizon_days
"""
amplitude, decay_rate, baseline = curve_params[:3]
forecasts = np.zeros(horizon_days)
# Clamp the arrival offset into the horizon
dua = int(max(0, min(days_until_arrival, horizon_days)))
# Pre-arrival segment: flat pre-order trickle, capped at the curve's scaled
# week-0 daily value (a pre-order day shouldn't out-sell the launch peak).
if dua > 0:
week0_daily = (amplitude / 7.0) * scale_factor + (baseline / 7.0)
pre_rate = preorder_daily_rate
if week0_daily > 0:
pre_rate = min(pre_rate, week0_daily)
forecasts[:dua] = max(0.0, pre_rate)
# Post-arrival segment: scaled launch curve, curve day 0 = arrival date.
if dua < horizon_days:
curve_part = forecast_from_curve(curve_params, scale_factor, 0, horizon_days - dua)
forecasts[dua:] = curve_part
return forecasts
# ---------------------------------------------------------------------------
# Batch data loading (eliminates N+1 per-product queries)
# ---------------------------------------------------------------------------
@@ -651,9 +697,11 @@ def batch_load_product_data(conn, products):
data = {
'preorder_sales': {},
'preorder_days': {},
'preorder_arrival_days': {},
'launch_sales': {},
'decay_velocity': {},
'mature_history': {},
'dormant_rate': {},
}
# Pre-order sales: orders placed BEFORE first received date
@@ -677,6 +725,39 @@ def batch_load_product_data(conn, products):
data['preorder_days'][int(row['pid'])] = float(row['preorder_days'])
log.info(f"Batch loaded pre-order sales for {len(data['preorder_sales'])}/{len(preorder_pids)} preorder products")
# Expected arrival per pre-order product, to time the launch curve.
# Prefer the soonest FUTURE expected_date on an open PO; if the only open
# PO has a past expected_date assume 7 days; if there's no open PO at all
# assume 14 days. See FORECAST_FIX_PLAN F4.
arrival_sql = """
SELECT pid,
MIN(expected_date) FILTER (
WHERE expected_date IS NOT NULL AND expected_date >= CURRENT_DATE
) AS future_arrival
FROM purchase_orders
WHERE pid = ANY(%s)
AND status IN ('created', 'ordered', 'electronically_sent', 'receiving_started')
GROUP BY pid
"""
adf = execute_query(conn, arrival_sql, [preorder_pids])
today = date.today()
for _, row in adf.iterrows():
pid = int(row['pid'])
fa = row['future_arrival']
if pd.notna(fa):
fa_date = pd.Timestamp(fa).date()
data['preorder_arrival_days'][pid] = max(0, (fa_date - today).days)
else:
data['preorder_arrival_days'][pid] = 7 # open PO, expected_date already past
no_po = 0
for pid in preorder_pids:
if int(pid) not in data['preorder_arrival_days']:
data['preorder_arrival_days'][int(pid)] = 14 # no open PO at all
no_po += 1
log.info(f"Batch loaded preorder arrival for "
f"{len(data['preorder_arrival_days']) - no_po}/{len(preorder_pids)} via open POs, "
f"{no_po} defaulted to 14d")
# Launch sales: first 14 days after first received
launch_pids = products[products['phase'] == 'launch']['pid'].tolist()
if launch_pids:
@@ -694,15 +775,23 @@ def batch_load_product_data(conn, products):
data['launch_sales'][int(row['pid'])] = float(row['total_sold'])
log.info(f"Batch loaded launch sales for {len(data['launch_sales'])}/{len(launch_pids)} launch products")
# Decay recent velocity: average daily sales over last 30 days
# Decay recent velocity: TRUE calendar-daily average over the last 30 days.
# We divide the summed units by calendar days (clipped to the product's age),
# NOT by the number of snapshot rows. Snapshots are sparse and mostly land on
# sold-days, so AVG(units_sold) averages over sold-days only and inflated the
# decay rate ~4x (measured 1.353 vs true 0.332 units/day). See FORECAST_FIX_PLAN F1.
decay_pids = products[products['phase'] == 'decay']['pid'].tolist()
if decay_pids:
sql = """
SELECT dps.pid, AVG(COALESCE(dps.units_sold, 0)) AS avg_daily
SELECT dps.pid,
SUM(COALESCE(dps.units_sold, 0))::float
/ GREATEST(LEAST(30, (CURRENT_DATE - pm.date_first_received::date)), 1) AS avg_daily
FROM daily_product_snapshots dps
JOIN product_metrics pm ON pm.pid = dps.pid
WHERE dps.pid = ANY(%s)
AND dps.snapshot_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY dps.pid
AND dps.snapshot_date >= pm.date_first_received::date
GROUP BY dps.pid, pm.date_first_received
"""
df = execute_query(conn, sql, [decay_pids])
for _, row in df.iterrows():
@@ -724,6 +813,25 @@ def batch_load_product_data(conn, products):
data['mature_history'][int(pid)] = group.copy()
log.info(f"Batch loaded history for {len(data['mature_history'])}/{len(mature_pids)} mature products")
# Dormant trailing order rate: dormant products forecast 0 by default, but
# ~11K of them still sell (restocks, promos, long-tail) — ~11% of all demand
# currently forecast as a hard zero. Load a trailing-180-day daily order rate
# so the dormant branch can carry a small positive rate. See FORECAST_FIX_PLAN F5.
dormant_pids = products[products['phase'] == 'dormant']['pid'].tolist()
if dormant_pids:
sql = """
SELECT o.pid, SUM(o.quantity) / 180.0 AS rate
FROM orders o
WHERE o.pid = ANY(%s)
AND o.canceled IS DISTINCT FROM TRUE
AND o.date >= CURRENT_DATE - INTERVAL '180 days'
GROUP BY o.pid
"""
df = execute_query(conn, sql, [dormant_pids])
for _, row in df.iterrows():
data['dormant_rate'][int(row['pid'])] = float(row['rate'])
log.info(f"Batch loaded dormant order rate for {len(data['dormant_rate'])}/{len(dormant_pids)} dormant products")
return data
@@ -829,11 +937,20 @@ def forecast_mature(product, history_df):
# Not enough data — flat velocity
return np.full(FORECAST_HORIZON_DAYS, velocity)
# Fill date gaps with 0 sales (days where product had no snapshot = no sales)
# Reindex over the FULL calendar window ending yesterday, not just the span
# between the first and last snapshot. resample() only covers first→last
# snapshot, so leading/trailing quiet periods are absent and the Holt level
# is fitted only on the product's busy span (can run ~4x too high). An
# explicit reindex fills every quiet calendar day with 0. (pid, snapshot_date)
# is unique so there is no duplicate-index risk; do NOT use combine_first
# (it keeps zeros over real data). See FORECAST_FIX_PLAN F2.
hist = history_df.copy()
hist['snapshot_date'] = pd.to_datetime(hist['snapshot_date'])
hist = hist.set_index('snapshot_date').resample('D').sum().fillna(0)
series = hist['units_sold'].values.astype(float)
hist = hist.set_index('snapshot_date')['units_sold']
full_index = pd.date_range(
end=pd.Timestamp(date.today() - timedelta(days=1)),
periods=EXP_SMOOTHING_WINDOW, freq='D')
series = hist.reindex(full_index, fill_value=0.0).values.astype(float)
# Need at least 2 non-zero values for smoothing
if np.count_nonzero(series) < 2:
@@ -956,9 +1073,24 @@ def generate_all_forecasts(conn, curves_df, dow_indices, monthly_indices=None,
today = date.today()
forecast_dates = [today + timedelta(days=i) for i in range(FORECAST_HORIZON_DAYS)]
# Pre-compute DOW and seasonal multipliers for each forecast date
# Pre-compute DOW and seasonal multipliers for each forecast date.
# DOW multipliers stay ABSOLUTE — every calibration is a multi-week average
# and therefore DOW-neutral, so reshaping by absolute DOW indices is correct.
# Seasonal indices must be applied RELATIVE to the calibration period:
# each per-product calibration (decay velocity, mature Holt level, launch /
# preorder scale) is fitted on raw recent actuals that already embed the
# current month's seasonal level. Multiplying by the absolute target-month
# index double-counts seasonality (~25% over-forecast at the May→June sale
# transition, worse near November). Divide by the trailing-30-day average
# index so only the seasonal *change* from calibration to target applies.
# See FORECAST_FIX_PLAN F3.
dow_multipliers = [dow_indices.get(d.isoweekday(), 1.0) for d in forecast_dates]
seasonal_multipliers = [monthly_indices.get(d.month, 1.0) for d in forecast_dates]
trailing = [today - timedelta(days=i) for i in range(1, 31)]
calibration_index = float(np.mean([monthly_indices.get(d.month, 1.0) for d in trailing]))
seasonal_multipliers = [
monthly_indices.get(d.month, 1.0) / max(calibration_index, 0.1)
for d in forecast_dates
]
# TRUNCATE before streaming writes
with conn.cursor() as cur:
@@ -1002,9 +1134,33 @@ def generate_all_forecasts(conn, curves_df, dow_indices, monthly_indices=None,
try:
curve_info = get_curve_for_product(product, curves_df)
if phase in ('preorder', 'launch'):
if phase == 'preorder':
if curve_info:
scale = compute_scale_factor(phase, product, curve_info, batch_data)
scale = compute_scale_factor('preorder', product, curve_info, batch_data)
# Time the launch curve to expected arrival instead of
# running it from today (F4). Pre-arrival days carry the
# observed pre-order trickle rate.
days_until_arrival = batch_data['preorder_arrival_days'].get(pid, 14)
preorder_units = batch_data['preorder_sales'].get(pid, 0)
preorder_days = batch_data['preorder_days'].get(pid, 1)
preorder_daily_rate = preorder_units / max(preorder_days, 1)
forecasts = forecast_preorder(
curve_info, scale, days_until_arrival,
preorder_daily_rate, FORECAST_HORIZON_DAYS)
method = 'lifecycle_curve'
else:
# No reliable curve — fall back to velocity if available
velocity = product.get('sales_velocity_daily') or 0
if velocity > 0:
forecasts = np.full(FORECAST_HORIZON_DAYS, velocity)
method = 'velocity'
else:
forecasts = forecast_dormant()
method = 'zero'
elif phase == 'launch':
if curve_info:
scale = compute_scale_factor('launch', product, curve_info, batch_data)
forecasts = forecast_from_curve(curve_info, scale, age, FORECAST_HORIZON_DAYS)
method = 'lifecycle_curve'
else:
@@ -1038,8 +1194,16 @@ def generate_all_forecasts(conn, curves_df, dow_indices, monthly_indices=None,
method = 'velocity'
else: # dormant
forecasts = forecast_dormant()
method = 'zero'
# Carry a small positive rate for dormant products that still
# trickle sales (restocks/promos/long-tail); only truly dead
# products stay at zero. See FORECAST_FIX_PLAN F5.
rate = batch_data['dormant_rate'].get(pid, 0)
if rate > 0:
forecasts = np.full(FORECAST_HORIZON_DAYS, rate)
method = 'velocity'
else:
forecasts = forecast_dormant()
method = 'zero'
# Confidence interval: use accuracy-calibrated margins per phase
base_margin = accuracy_margins.get(phase, 0.5)
@@ -1108,6 +1272,8 @@ def archive_forecasts(conn, run_id):
""")
cur.execute("CREATE INDEX IF NOT EXISTS idx_pfh_date ON product_forecasts_history(forecast_date)")
cur.execute("CREATE INDEX IF NOT EXISTS idx_pfh_pid_date ON product_forecasts_history(pid, forecast_date)")
# Naive-baseline column for forecast value-added (FVA). See FORECAST_FIX_PLAN F8.
cur.execute("ALTER TABLE product_forecasts_history ADD COLUMN IF NOT EXISTS naive_units NUMERIC(10,2)")
# Find the previous completed run (whose forecasts are still in product_forecasts)
cur.execute("""
@@ -1124,15 +1290,27 @@ def archive_forecasts(conn, run_id):
prev_run_id = prev_run[0]
# Archive only past-date forecasts (where actuals now exist)
# Archive only past-date forecasts (where actuals now exist). Attach the
# naive baseline (flat trailing-28-day daily average) at the same time so
# forecast value-added can be measured. See FORECAST_FIX_PLAN F8.
cur.execute("""
INSERT INTO product_forecasts_history
(run_id, pid, forecast_date, forecast_units, forecast_revenue,
lifecycle_phase, forecast_method, confidence_lower, confidence_upper, generated_at)
SELECT %s, pid, forecast_date, forecast_units, forecast_revenue,
lifecycle_phase, forecast_method, confidence_lower, confidence_upper, generated_at
FROM product_forecasts
WHERE forecast_date < CURRENT_DATE
lifecycle_phase, forecast_method, confidence_lower, confidence_upper,
generated_at, naive_units)
SELECT %s, pf.pid, pf.forecast_date, pf.forecast_units, pf.forecast_revenue,
pf.lifecycle_phase, pf.forecast_method, pf.confidence_lower, pf.confidence_upper,
pf.generated_at, COALESCE(nv.naive_daily, 0)
FROM product_forecasts pf
LEFT JOIN (
SELECT o.pid, SUM(o.quantity) / 28.0 AS naive_daily
FROM orders o
WHERE o.canceled IS DISTINCT FROM TRUE
AND o.date >= CURRENT_DATE - INTERVAL '28 days'
AND o.date < CURRENT_DATE
GROUP BY o.pid
) nv ON nv.pid = pf.pid
WHERE pf.forecast_date < CURRENT_DATE
ON CONFLICT (run_id, pid, forecast_date) DO NOTHING
""", (prev_run_id,))
@@ -1154,6 +1332,48 @@ def archive_forecasts(conn, run_id):
return archived
def archive_future_leads(conn, run_id):
"""
Archive a sampled set of FUTURE-lead forecasts from the just-generated
product_forecasts, attributed to the current run.
The past-date archive in archive_forecasts() only ever captures the 1-day
slice that just elapsed, so every accuracy sample lands in the '1-7d' lead
bucket and the 15/30/60/90-day forecasts that purchasing actually rides on
are never validated. Here we snapshot the 7/14/30/60/89-day-ahead leads
(non-dormant) so that, once each date passes, compute_accuracy() can score
them in their lead bucket. The naive baseline is attached the same way as in
the past-date path. Future-dated rows survive the 90-day prune until their
own date passes. See FORECAST_FIX_PLAN F7.
"""
with conn.cursor() as cur:
cur.execute("""
INSERT INTO product_forecasts_history
(run_id, pid, forecast_date, forecast_units, forecast_revenue,
lifecycle_phase, forecast_method, confidence_lower, confidence_upper,
generated_at, naive_units)
SELECT %s, pf.pid, pf.forecast_date, pf.forecast_units, pf.forecast_revenue,
pf.lifecycle_phase, pf.forecast_method, pf.confidence_lower, pf.confidence_upper,
pf.generated_at, COALESCE(nv.naive_daily, 0)
FROM product_forecasts pf
LEFT JOIN (
SELECT o.pid, SUM(o.quantity) / 28.0 AS naive_daily
FROM orders o
WHERE o.canceled IS DISTINCT FROM TRUE
AND o.date >= CURRENT_DATE - INTERVAL '28 days'
AND o.date < CURRENT_DATE
GROUP BY o.pid
) nv ON nv.pid = pf.pid
WHERE pf.lifecycle_phase != 'dormant'
AND pf.forecast_date - CURRENT_DATE IN (7, 14, 30, 60, 89)
ON CONFLICT (run_id, pid, forecast_date) DO NOTHING
""", (run_id,))
archived = cur.rowcount
conn.commit()
log.info(f"Archived {archived} future-lead forecast rows (7/14/30/60/89d) for run {run_id}")
return archived
def compute_accuracy(conn, run_id):
"""
Compute forecast accuracy metrics from archived history vs. actual sales.
@@ -1162,11 +1382,18 @@ def compute_accuracy(conn, run_id):
(pid, forecast_date = snapshot_date) to compare forecasted vs. actual units.
Stores results in forecast_accuracy table, broken down by:
- overall: single aggregate row
- overall: two rows — 'all' (non-dormant) and 'all_incl_dormant' (F5)
- overall_weekly: per-product weekly-grain WMAPE — the informative headline
for intermittent demand (daily grain has a ~190% floor) (F9)
- by_phase: per lifecycle phase
- by_lead_time: bucketed by how far ahead the forecast was
- by_lead_time: bucketed by how far ahead the forecast was — long-lead
buckets populate as the future-lead archives mature (F7)
- by_method: per forecast method
- daily: per forecast_date (for trend charts)
Every dimension also stores naive_wmape (flat trailing-28d baseline) and
fva = 1 - wmape/naive_wmape, so the engine can be judged as value-over-naive
(F8). Only realized dates (forecast_date < CURRENT_DATE) are scored.
"""
with conn.cursor() as cur:
# Ensure accuracy table exists
@@ -1186,6 +1413,10 @@ def compute_accuracy(conn, run_id):
PRIMARY KEY (run_id, metric_type, dimension_value)
)
""")
# Naive-baseline WMAPE and forecast value-added (FVA = 1 - wmape/naive_wmape).
# See FORECAST_FIX_PLAN F8.
cur.execute("ALTER TABLE forecast_accuracy ADD COLUMN IF NOT EXISTS naive_wmape NUMERIC(10,4)")
cur.execute("ALTER TABLE forecast_accuracy ADD COLUMN IF NOT EXISTS fva NUMERIC(10,4)")
conn.commit()
# Check if we have any history to analyze
@@ -1195,124 +1426,199 @@ def compute_accuracy(conn, run_id):
log.info("No forecast history available for accuracy computation")
return
# For each (pid, forecast_date) pair, keep only the most recent run's
# forecast row. This prevents double-counting when multiple runs have
# archived forecasts for the same product×date combination.
accuracy_cte = """
WITH ranked_history AS (
# Base CTEs (FORECAST_FIX_PLAN F7):
# - Only score realized dates (forecast_date < CURRENT_DATE); future-lead
# archives are excluded until their date passes.
# - short_lead*: lead 0-6 deduped per (pid, forecast_date) — preserves the
# meaning of the existing headline metrics. short_lead_eval keeps the
# raw snapshot grid (incl. zero-zero days) for complete-week detection;
# `accuracy` drops zero-zero days for daily-grain metrics.
# - lead_dedup/lead_accuracy: deduped per (pid, forecast_date, lead_bucket)
# so each long-lead bucket gets its own sample (the by_lead_time table).
base_cte = """
WITH ranked_all AS (
SELECT
pfh.*,
pfh.pid, pfh.forecast_date, pfh.forecast_units, pfh.naive_units,
pfh.lifecycle_phase, pfh.forecast_method,
fr.started_at,
ROW_NUMBER() OVER (
PARTITION BY pfh.pid, pfh.forecast_date
ORDER BY fr.started_at DESC
) AS rn
(pfh.forecast_date - fr.started_at::date) AS lead_days,
CASE
WHEN (pfh.forecast_date - fr.started_at::date) BETWEEN 0 AND 6 THEN '1-7d'
WHEN (pfh.forecast_date - fr.started_at::date) BETWEEN 7 AND 13 THEN '8-14d'
WHEN (pfh.forecast_date - fr.started_at::date) BETWEEN 14 AND 29 THEN '15-30d'
WHEN (pfh.forecast_date - fr.started_at::date) BETWEEN 30 AND 59 THEN '31-60d'
ELSE '61-90d'
END AS lead_bucket
FROM product_forecasts_history pfh
JOIN forecast_runs fr ON fr.id = pfh.run_id
WHERE pfh.forecast_date < CURRENT_DATE
),
short_lead AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY pid, forecast_date ORDER BY started_at DESC
) AS rn
FROM ranked_all
WHERE lead_days BETWEEN 0 AND 6
),
short_lead_eval AS (
SELECT sl.pid, sl.lifecycle_phase, sl.forecast_method, sl.forecast_date,
sl.forecast_units, sl.naive_units,
COALESCE(dps.units_sold, 0) AS actual_units,
(sl.forecast_units - COALESCE(dps.units_sold, 0)) AS error,
ABS(sl.forecast_units - COALESCE(dps.units_sold, 0)) AS abs_error
FROM short_lead sl
LEFT JOIN daily_product_snapshots dps
ON dps.pid = sl.pid AND dps.snapshot_date = sl.forecast_date
WHERE sl.rn = 1
),
accuracy AS (
SELECT
rh.lifecycle_phase,
rh.forecast_method,
rh.forecast_date,
(rh.forecast_date - rh.started_at::date) AS lead_days,
rh.forecast_units,
SELECT * FROM short_lead_eval
WHERE NOT (forecast_units = 0 AND actual_units = 0)
),
lead_dedup AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY pid, forecast_date, lead_bucket ORDER BY started_at DESC
) AS rn
FROM ranked_all
),
lead_accuracy AS (
SELECT ld.lead_bucket, ld.forecast_units, ld.naive_units,
COALESCE(dps.units_sold, 0) AS actual_units,
(rh.forecast_units - COALESCE(dps.units_sold, 0)) AS error,
ABS(rh.forecast_units - COALESCE(dps.units_sold, 0)) AS abs_error
FROM ranked_history rh
(ld.forecast_units - COALESCE(dps.units_sold, 0)) AS error,
ABS(ld.forecast_units - COALESCE(dps.units_sold, 0)) AS abs_error
FROM lead_dedup ld
LEFT JOIN daily_product_snapshots dps
ON dps.pid = rh.pid AND dps.snapshot_date = rh.forecast_date
WHERE rh.rn = 1
AND NOT (rh.forecast_units = 0 AND COALESCE(dps.units_sold, 0) = 0)
ON dps.pid = ld.pid AND dps.snapshot_date = ld.forecast_date
WHERE ld.rn = 1
AND ld.lifecycle_phase != 'dormant'
AND NOT (ld.forecast_units = 0 AND COALESCE(dps.units_sold, 0) = 0)
)
"""
# Compute and insert metrics for each dimension
dimensions = {
'overall': "SELECT 'all' AS dim",
'by_phase': "SELECT DISTINCT lifecycle_phase AS dim FROM accuracy",
'by_lead_time': """
SELECT DISTINCT
CASE
WHEN lead_days BETWEEN 0 AND 6 THEN '1-7d'
WHEN lead_days BETWEEN 7 AND 13 THEN '8-14d'
WHEN lead_days BETWEEN 14 AND 29 THEN '15-30d'
WHEN lead_days BETWEEN 30 AND 59 THEN '31-60d'
ELSE '61-90d'
END AS dim
FROM accuracy
""",
'by_method': "SELECT DISTINCT forecast_method AS dim FROM accuracy",
'daily': "SELECT DISTINCT forecast_date::text AS dim FROM accuracy",
}
filter_clauses = {
'overall': "lifecycle_phase != 'dormant'",
'by_phase': "lifecycle_phase = dims.dim",
'by_lead_time': """
CASE
WHEN lead_days BETWEEN 0 AND 6 THEN '1-7d'
WHEN lead_days BETWEEN 7 AND 13 THEN '8-14d'
WHEN lead_days BETWEEN 14 AND 29 THEN '15-30d'
WHEN lead_days BETWEEN 30 AND 59 THEN '31-60d'
ELSE '61-90d'
END = dims.dim
""",
'by_method': "forecast_method = dims.dim",
'daily': "forecast_date::text = dims.dim",
}
total_inserted = 0
for metric_type, dim_query in dimensions.items():
filter_clause = filter_clauses[metric_type]
sql = f"""
{accuracy_cte},
dims AS ({dim_query})
# Daily-grain aggregate over a source CTE aliased `a`, computing the
# engine WMAPE plus the naive-baseline WMAPE (NULL-safe: rows archived
# before F8 have naive_units NULL and are excluded from the naive sums).
def daily_agg(dim_expr, source, where=None, group_by=None):
where_sql = f"WHERE {where}" if where else ""
group_sql = f"GROUP BY {group_by}" if group_by else ""
return f"""
SELECT
dims.dim,
{dim_expr} AS dim,
COUNT(*) AS sample_size,
COALESCE(SUM(a.actual_units), 0) AS total_actual,
COALESCE(SUM(a.forecast_units), 0) AS total_forecast,
AVG(a.abs_error) AS mae,
CASE WHEN SUM(a.actual_units) > 0
THEN SUM(a.abs_error) / SUM(a.actual_units)
ELSE NULL END AS wmape,
THEN SUM(a.abs_error) / SUM(a.actual_units) ELSE NULL END AS wmape,
AVG(a.error) AS bias,
SQRT(AVG(POWER(a.error, 2))) AS rmse
FROM dims
CROSS JOIN accuracy a
WHERE {filter_clause}
GROUP BY dims.dim
SQRT(AVG(POWER(a.error, 2))) AS rmse,
CASE WHEN SUM(a.actual_units) FILTER (WHERE a.naive_units IS NOT NULL) > 0
THEN SUM(ABS(a.naive_units - a.actual_units)) FILTER (WHERE a.naive_units IS NOT NULL)
/ SUM(a.actual_units) FILTER (WHERE a.naive_units IS NOT NULL)
ELSE NULL END AS naive_wmape
FROM {source} a
{where_sql}
{group_sql}
"""
cur.execute(sql)
rows = cur.fetchall()
insert_sql = """
INSERT INTO forecast_accuracy
(run_id, metric_type, dimension_value, sample_size,
total_actual_units, total_forecast_units, mae, wmape, bias, rmse,
naive_wmape, fva)
VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
ON CONFLICT (run_id, metric_type, dimension_value)
DO UPDATE SET
sample_size = EXCLUDED.sample_size,
total_actual_units = EXCLUDED.total_actual_units,
total_forecast_units = EXCLUDED.total_forecast_units,
mae = EXCLUDED.mae, wmape = EXCLUDED.wmape,
bias = EXCLUDED.bias, rmse = EXCLUDED.rmse,
naive_wmape = EXCLUDED.naive_wmape, fva = EXCLUDED.fva,
computed_at = NOW()
"""
for row in rows:
dim_val, sample_size, total_actual, total_forecast, mae, wmape, bias, rmse = row
cur.execute("""
INSERT INTO forecast_accuracy
(run_id, metric_type, dimension_value, sample_size,
total_actual_units, total_forecast_units, mae, wmape, bias, rmse)
VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
ON CONFLICT (run_id, metric_type, dimension_value)
DO UPDATE SET
sample_size = EXCLUDED.sample_size,
total_actual_units = EXCLUDED.total_actual_units,
total_forecast_units = EXCLUDED.total_forecast_units,
mae = EXCLUDED.mae, wmape = EXCLUDED.wmape,
bias = EXCLUDED.bias, rmse = EXCLUDED.rmse,
computed_at = NOW()
""", (run_id, metric_type, dim_val, sample_size,
float(total_actual), float(total_forecast),
float(mae) if mae is not None else None,
float(wmape) if wmape is not None else None,
float(bias) if bias is not None else None,
float(rmse) if rmse is not None else None))
total_inserted += 1
def _f(x):
return float(x) if x is not None else None
def run_and_insert(metric_type, sql):
cur.execute(base_cte + sql)
n = 0
for row in cur.fetchall():
(dim_val, sample_size, total_actual, total_forecast,
mae, wmape, bias, rmse, naive_wmape) = row
fva = None
if wmape is not None and naive_wmape is not None and float(naive_wmape) > 0:
fva = 1.0 - float(wmape) / float(naive_wmape)
cur.execute(insert_sql, (
run_id, metric_type, dim_val, sample_size,
_f(total_actual), _f(total_forecast), _f(mae), _f(wmape),
_f(bias), _f(rmse), _f(naive_wmape), _f(fva)))
n += 1
return n
total_inserted = 0
# overall: two rows — 'all' (non-dormant, the headline) and
# 'all_incl_dormant' (everything, so the ~11% dormant demand stops being
# invisible). Both are short-lead (lead 0-6). F5.
overall_source = """(
SELECT a.*, 'all'::text AS dim FROM accuracy a WHERE a.lifecycle_phase != 'dormant'
UNION ALL
SELECT a.*, 'all_incl_dormant'::text AS dim FROM accuracy a
)"""
total_inserted += run_and_insert('overall',
daily_agg('a.dim', overall_source, group_by='a.dim'))
# by_phase / by_method / daily — short-lead daily-grain over `accuracy`.
total_inserted += run_and_insert('by_phase',
daily_agg('a.lifecycle_phase', 'accuracy', group_by='a.lifecycle_phase'))
total_inserted += run_and_insert('by_method',
daily_agg('a.forecast_method', 'accuracy', group_by='a.forecast_method'))
total_inserted += run_and_insert('daily',
daily_agg('a.forecast_date::text', 'accuracy',
where="a.lifecycle_phase != 'dormant'", group_by='a.forecast_date'))
# by_lead_time — one sample per (pid, date, lead bucket) over `lead_accuracy`.
# Buckets beyond '1-7d' populate as the future-lead archives (F7) mature.
total_inserted += run_and_insert('by_lead_time',
daily_agg('a.lead_bucket', 'lead_accuracy', group_by='a.lead_bucket'))
# overall_weekly — the informative headline for intermittent retail demand.
# Aggregate the short-lead rows to (pid, complete week), then WMAPE over
# pid-weeks. Daily-grain WMAPE has a ~190% floor on this catalog; weekly
# grain is ~109% and responds to real improvement. F9.
weekly_sql = """,
weekly AS (
SELECT pid, date_trunc('week', forecast_date) AS wk,
SUM(forecast_units) AS fc_week,
SUM(actual_units) AS act_week,
SUM(naive_units) AS naive_week,
bool_and(naive_units IS NOT NULL) AS naive_complete
FROM short_lead_eval
WHERE lifecycle_phase != 'dormant'
GROUP BY pid, date_trunc('week', forecast_date)
HAVING COUNT(*) = 7
)
SELECT 'all'::text AS dim,
COUNT(*) AS sample_size,
COALESCE(SUM(act_week), 0) AS total_actual,
COALESCE(SUM(fc_week), 0) AS total_forecast,
AVG(ABS(fc_week - act_week)) AS mae,
CASE WHEN SUM(act_week) > 0
THEN SUM(ABS(fc_week - act_week)) / SUM(act_week) ELSE NULL END AS wmape,
AVG(fc_week - act_week) AS bias,
SQRT(AVG(POWER(fc_week - act_week, 2))) AS rmse,
CASE WHEN SUM(act_week) FILTER (WHERE naive_complete) > 0
THEN SUM(ABS(naive_week - act_week)) FILTER (WHERE naive_complete)
/ SUM(act_week) FILTER (WHERE naive_complete)
ELSE NULL END AS naive_wmape
FROM weekly
WHERE NOT (fc_week = 0 AND act_week = 0)
"""
total_inserted += run_and_insert('overall_weekly', weekly_sql)
conn.commit()
@@ -1562,6 +1868,10 @@ def main():
conn, curves_df, dow_indices, monthly_indices, accuracy_margins
)
# Phase 4b: Snapshot sampled future-lead forecasts (7/14/30/60/89d) from
# the fresh run so long-lead accuracy populates once those dates pass (F7).
archive_future_leads(conn, run_id)
duration = time.time() - start_time
# Record run completion (include DOW indices in metadata)
+16 -6
View File
@@ -1,6 +1,12 @@
const path = require('path');
const fs = require('fs');
const { spawn } = require('child_process');
// Maintenance switch: `touch .pause-auto-update` in inventory-server/ to make the
// recurring full-update a no-op (e.g. during a long manual full re-import or a
// snapshot rebuild). Remove the file to resume.
const PAUSE_FILE = path.join(__dirname, '..', '.pause-auto-update');
function outputProgress(data) {
if (!data.status) {
data = {
@@ -22,12 +28,8 @@ function runScript(scriptPath) {
child.stdout.on('data', (data) => {
const lines = data.toString().split('\n');
lines.filter(line => line.trim()).forEach(line => {
try {
console.log(line); // Pass through the JSON output
output += line + '\n';
} catch (e) {
console.log(line); // If not JSON, just log it directly
}
console.log(line); // Pass through the (usually JSON) output
output += line + '\n';
});
});
@@ -50,6 +52,14 @@ function runScript(scriptPath) {
}
async function fullUpdate() {
if (fs.existsSync(PAUSE_FILE)) {
outputProgress({
status: 'complete',
operation: 'Full update skipped',
message: `Auto-update is paused (${PAUSE_FILE} exists) — remove the file to resume`
});
return;
}
try {
// Step 1: Import from Production
outputProgress({
+15 -13
View File
@@ -13,10 +13,14 @@ async function importCategories(prodConnection, localConnection) {
let skippedCategories = [];
try {
// Start a single transaction for the entire import
await localConnection.query('BEGIN');
// Temporarily disable the trigger that's causing problems
// Start a single transaction for the entire import.
// Must use the wrapper's beginTransaction() (dedicated client) — query('BEGIN')
// checks out a client per call, so BEGIN/work/COMMIT would not be guaranteed
// to share a connection.
await localConnection.beginTransaction();
// Temporarily disable the trigger that's causing problems.
// ALTER TABLE ... DISABLE TRIGGER is transactional: a rollback restores it.
await localConnection.query('ALTER TABLE categories DISABLE TRIGGER update_categories_updated_at');
// Process each type in order with its own savepoint
@@ -148,8 +152,11 @@ async function importCategories(prodConnection, localConnection) {
}
}
// Re-enable the trigger INSIDE the transaction so disable/enable are atomic
await localConnection.query('ALTER TABLE categories ENABLE TRIGGER update_categories_updated_at');
// Commit the entire transaction - we'll do this even if we have skipped categories
await localConnection.query('COMMIT');
await localConnection.commit();
// Update sync status
await localConnection.query(`
@@ -158,9 +165,6 @@ async function importCategories(prodConnection, localConnection) {
ON CONFLICT (table_name) DO UPDATE SET
last_sync_timestamp = NOW()
`);
// Re-enable the trigger
await localConnection.query('ALTER TABLE categories ENABLE TRIGGER update_categories_updated_at');
outputProgress({
status: "complete",
@@ -187,12 +191,10 @@ async function importCategories(prodConnection, localConnection) {
} catch (error) {
console.error("Error importing categories:", error);
// Only rollback if we haven't committed yet
// Only rollback if we haven't committed yet. The rollback also restores the
// trigger state (DISABLE TRIGGER was inside the transaction).
try {
await localConnection.query('ROLLBACK');
// Make sure we re-enable the trigger even if there was an error
await localConnection.query('ALTER TABLE categories ENABLE TRIGGER update_categories_updated_at');
await localConnection.rollback();
} catch (rollbackError) {
console.error("Error during rollback:", rollbackError);
}
@@ -24,7 +24,8 @@ async function importDailyDeals(prodConnection, localConnection) {
const startTime = Date.now();
try {
await localConnection.query('BEGIN');
// Wrapper's beginTransaction() pins a dedicated client; query('BEGIN') would not.
await localConnection.beginTransaction();
// Fetch recent daily deals from production (MySQL 5.7, no CTEs)
// Join product_current_prices to get the actual deal price
@@ -127,7 +128,7 @@ async function importDailyDeals(prodConnection, localConnection) {
last_sync_timestamp = NOW()
`);
await localConnection.query('COMMIT');
await localConnection.commit();
outputProgress({
status: "complete",
@@ -149,7 +150,7 @@ async function importDailyDeals(prodConnection, localConnection) {
console.error("Error importing daily deals:", error);
try {
await localConnection.query('ROLLBACK');
await localConnection.rollback();
} catch (rollbackError) {
console.error("Error during rollback:", rollbackError);
}
+108 -103
View File
@@ -1,5 +1,4 @@
const { outputProgress, formatElapsedTime, estimateRemaining, calculateRate } = require('../metrics-new/utils/progress');
const { importMissingProducts, setupTemporaryTables, cleanupTemporaryTables, materializeCalculations } = require('./products');
/**
* Imports orders from a production MySQL database to a local PostgreSQL database.
@@ -28,6 +27,7 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
22: 'placed_incomplete',
30: 'canceled',
40: 'awaiting_payment',
45: 'payment_pending',
50: 'awaiting_products',
55: 'shipping_later',
56: 'shipping_together',
@@ -35,6 +35,7 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
61: 'flagged',
62: 'fix_before_pick',
65: 'manual_picking',
67: 'remote_send',
70: 'in_pt',
80: 'picked',
90: 'awaiting_shipment',
@@ -65,6 +66,12 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
console.log('Orders: Using last sync time:', lastSyncTime, '(adjusted:', mysqlSyncTime, ')');
// Capture the next watermark from MySQL's own clock BEFORE querying any data.
// Rows modified while the import runs stay above this watermark for the next
// incremental run (overlap re-imports are harmless upserts); writing NOW()
// after the import finishes would permanently skip them.
const [[{ source_now: sourceNow }]] = await prodConnection.query('SELECT NOW() as source_now');
// First get count of order items - Keep MySQL compatible for production
const [[{ total }]] = await prodConnection.query(`
SELECT COUNT(*) as total
@@ -100,7 +107,6 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
COALESCE(NULLIF(TRIM(oi.prod_itemnumber), ''), 'NO-SKU') as SKU,
oi.prod_price as price,
oi.qty_ordered as quantity,
COALESCE(oi.prod_price_reg - oi.prod_price, 0) as base_discount,
oi.stamp as last_modified
FROM order_items oi
JOIN _order o ON oi.order_id = o.order_id
@@ -131,10 +137,8 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
await localConnection.query(`
DROP TABLE IF EXISTS temp_order_items;
DROP TABLE IF EXISTS temp_order_meta;
DROP TABLE IF EXISTS temp_order_discounts;
DROP TABLE IF EXISTS temp_order_taxes;
DROP TABLE IF EXISTS temp_order_costs;
DROP TABLE IF EXISTS temp_main_discounts;
DROP TABLE IF EXISTS temp_item_discounts;
CREATE TEMP TABLE temp_order_items (
@@ -143,7 +147,6 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
sku TEXT NOT NULL,
price NUMERIC(14, 4) NOT NULL,
quantity INTEGER NOT NULL,
base_discount NUMERIC(14, 4) DEFAULT 0,
PRIMARY KEY (order_id, pid)
);
@@ -160,20 +163,6 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
PRIMARY KEY (order_id)
);
CREATE TEMP TABLE temp_order_discounts (
order_id INTEGER NOT NULL,
pid INTEGER NOT NULL,
discount NUMERIC(14, 4) NOT NULL,
PRIMARY KEY (order_id, pid)
);
CREATE TEMP TABLE temp_main_discounts (
order_id INTEGER NOT NULL,
discount_id INTEGER NOT NULL,
discount_amount_subtotal NUMERIC(14, 4) DEFAULT 0.0000,
PRIMARY KEY (order_id, discount_id)
);
CREATE TEMP TABLE temp_item_discounts (
order_id INTEGER NOT NULL,
pid INTEGER NOT NULL,
@@ -198,10 +187,8 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
CREATE INDEX idx_temp_order_items_pid ON temp_order_items(pid);
CREATE INDEX idx_temp_order_meta_order_id ON temp_order_meta(order_id);
CREATE INDEX idx_temp_order_discounts_order_pid ON temp_order_discounts(order_id, pid);
CREATE INDEX idx_temp_order_taxes_order_pid ON temp_order_taxes(order_id, pid);
CREATE INDEX idx_temp_order_costs_order_pid ON temp_order_costs(order_id, pid);
CREATE INDEX idx_temp_main_discounts_discount_id ON temp_main_discounts(discount_id);
CREATE INDEX idx_temp_item_discounts_order_pid ON temp_item_discounts(order_id, pid);
CREATE INDEX idx_temp_item_discounts_discount_id ON temp_item_discounts(discount_id);
`);
@@ -216,21 +203,20 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
await localConnection.beginTransaction();
try {
const batch = orderItems.slice(i, Math.min(i + 5000, orderItems.length));
const placeholders = batch.map((_, idx) =>
`($${idx * 6 + 1}, $${idx * 6 + 2}, $${idx * 6 + 3}, $${idx * 6 + 4}, $${idx * 6 + 5}, $${idx * 6 + 6})`
const placeholders = batch.map((_, idx) =>
`($${idx * 5 + 1}, $${idx * 5 + 2}, $${idx * 5 + 3}, $${idx * 5 + 4}, $${idx * 5 + 5})`
).join(",");
const values = batch.flatMap(item => [
item.order_id, item.prod_pid, item.SKU, item.price, item.quantity, item.base_discount
item.order_id, item.prod_pid, item.SKU, item.price, item.quantity
]);
await localConnection.query(`
INSERT INTO temp_order_items (order_id, pid, sku, price, quantity, base_discount)
INSERT INTO temp_order_items (order_id, pid, sku, price, quantity)
VALUES ${placeholders}
ON CONFLICT (order_id, pid) DO UPDATE SET
sku = EXCLUDED.sku,
price = EXCLUDED.price,
quantity = EXCLUDED.quantity,
base_discount = EXCLUDED.base_discount
quantity = EXCLUDED.quantity
`, values);
await localConnection.commit();
@@ -337,49 +323,15 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
};
const processDiscountsBatch = async (batchIds) => {
// First, load main discount records
const [mainDiscounts] = await prodConnection.query(`
SELECT order_id, discount_id, discount_amount_subtotal
FROM order_discounts
WHERE order_id IN (?)
`, [batchIds]);
if (mainDiscounts.length > 0) {
await localConnection.beginTransaction();
try {
for (let j = 0; j < mainDiscounts.length; j += PG_BATCH_SIZE) {
const subBatch = mainDiscounts.slice(j, j + PG_BATCH_SIZE);
if (subBatch.length === 0) continue;
const placeholders = subBatch.map((_, idx) =>
`($${idx * 3 + 1}, $${idx * 3 + 2}, $${idx * 3 + 3})`
).join(",");
const values = subBatch.flatMap(d => [
d.order_id,
d.discount_id,
d.discount_amount_subtotal || 0
]);
await localConnection.query(`
INSERT INTO temp_main_discounts (order_id, discount_id, discount_amount_subtotal)
VALUES ${placeholders}
ON CONFLICT (order_id, discount_id) DO UPDATE SET
discount_amount_subtotal = EXCLUDED.discount_amount_subtotal
`, values);
}
await localConnection.commit();
} catch (error) {
await localConnection.rollback();
throw error;
}
}
// Then, load item discount records
// Load item-level discount records. Only which = 2 rows are real per-item
// discount amounts; which = 1 rows store the price of free promo-added
// items and which = 3 rows are usage records (neither is a discount).
// These amounts are NOT included in summary_discount_subtotal, so they
// must be added on top of the prorated subtotal discount unconditionally.
const [discounts] = await prodConnection.query(`
SELECT order_id, pid, discount_id, amount
FROM order_discount_items
WHERE order_id IN (?)
WHERE order_id IN (?) AND which = 2
`, [batchIds]);
if (discounts.length === 0) return;
@@ -418,16 +370,6 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
`, values);
}
// Create aggregated view with a simpler, safer query that avoids duplicates
await localConnection.query(`
TRUNCATE temp_order_discounts;
INSERT INTO temp_order_discounts (order_id, pid, discount)
SELECT order_id, pid, SUM(amount) as discount
FROM temp_item_discounts
GROUP BY order_id, pid
`);
await localConnection.commit();
} catch (error) {
await localConnection.rollback();
@@ -603,42 +545,54 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
try {
const [orders] = await localConnection.query(`
WITH order_totals AS (
SELECT
SELECT
oi.order_id,
oi.pid,
-- Instead of using ARRAY_AGG which can cause duplicate issues, use SUM with a CASE
SUM(CASE
WHEN COALESCE(md.discount_amount_subtotal, 0) > 0 THEN id.amount
ELSE 0
END) as promo_discount_sum,
-- Item-level promo discounts (which = 2 rows). These live outside
-- summary_discount_subtotal, so they are summed unconditionally.
SUM(COALESCE(id.amount, 0)) as promo_discount_sum,
COALESCE(ot.tax, 0) as total_tax,
COALESCE(oc.costeach, pc.cost_price, oi.price * 0.5) as costeach
FROM temp_order_items oi
LEFT JOIN temp_item_discounts id ON oi.order_id = id.order_id AND oi.pid = id.pid
LEFT JOIN temp_main_discounts md ON id.order_id = md.order_id AND id.discount_id = md.discount_id
LEFT JOIN temp_order_taxes ot ON oi.order_id = ot.order_id AND oi.pid = ot.pid
LEFT JOIN temp_order_costs oc ON oi.order_id = oc.order_id AND oi.pid = oc.pid
LEFT JOIN temp_product_costs pc ON oi.pid = pc.pid
WHERE oi.order_id = ANY($1)
GROUP BY oi.order_id, oi.pid, ot.tax, oc.costeach, pc.cost_price
)
SELECT
SELECT
oi.order_id as order_number,
oi.pid::bigint as pid,
oi.sku,
om.date,
oi.price,
oi.quantity,
-- Discount = prorated order-level subtotal discount + item-level promo
-- discounts, clamped so a sale line can never be discounted below free.
(
-- Prorated Points Discount (e.g. loyalty points applied at order level)
CASE
WHEN om.summary_discount_subtotal > 0 AND om.summary_subtotal > 0 THEN
COALESCE(ROUND((om.summary_discount_subtotal * (oi.price * oi.quantity)) / NULLIF(om.summary_subtotal, 0), 4), 0)
ELSE 0
CASE WHEN oi.quantity > 0 THEN
LEAST(
(
CASE
WHEN om.summary_discount_subtotal > 0 AND om.summary_subtotal > 0 THEN
COALESCE(ROUND((om.summary_discount_subtotal * (oi.price * oi.quantity)) / NULLIF(om.summary_subtotal, 0), 4), 0)
ELSE 0
END
+ COALESCE(ot.promo_discount_sum, 0)
),
oi.price * oi.quantity
)
ELSE
(
CASE
WHEN om.summary_discount_subtotal > 0 AND om.summary_subtotal > 0 THEN
COALESCE(ROUND((om.summary_discount_subtotal * (oi.price * oi.quantity)) / NULLIF(om.summary_subtotal, 0), 4), 0)
ELSE 0
END
+ COALESCE(ot.promo_discount_sum, 0)
)
END
+
-- Specific Item-Level Promo Discount (coupon codes, etc.)
COALESCE(ot.promo_discount_sum, 0)
)::NUMERIC(14, 4) as discount,
COALESCE(ot.total_tax, 0)::NUMERIC(14, 4) as tax,
false as tax_included,
@@ -765,34 +719,83 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
}
}
// Start a transaction for updating sync status and dropping temp tables
// Reconciliation 2 prep: fetch canceled (15) / combined (16) orders from MySQL
// WITHOUT a date_placed filter — combine_orders zeroes date_placed on the source
// orders, so the main item query can never re-fetch them. Done before opening
// the PG transaction so we don't hold it across a MySQL round-trip.
const [statusSweepRows] = await prodConnection.query(`
SELECT order_id, order_status
FROM _order
WHERE order_status IN (15, 16)
${incrementalUpdate ? 'AND stamp > ?' : ''}
`, incrementalUpdate ? [mysqlSyncTime] : []);
let staleItemsDeleted = 0;
let sweepUpdated = 0;
// Final transaction: reconcile deletions, sweep statuses, update sync status, drop temps
await localConnection.beginTransaction();
try {
// Update sync status
// Reconciliation 1: delete PG item rows that no longer exist in MySQL for the
// orders fetched this run. temp_order_items holds the complete current item
// set of every fetched order (staff edits and unpicked promo items DELETE
// order_items rows in MySQL, which an upsert-only import never removes).
const [reconcileResult] = await localConnection.query(`
DELETE FROM orders o
USING (SELECT DISTINCT order_id FROM temp_order_items) fetched
WHERE o.order_number = fetched.order_id::text -- orders.order_number is TEXT
AND NOT EXISTS (
SELECT 1 FROM temp_order_items t
WHERE t.order_id = fetched.order_id AND t.pid = o.pid
)
`);
staleItemsDeleted = reconcileResult.rowCount || 0;
// Reconciliation 2: mark canceled/combined orders. 'combined' source orders were
// merged into a new order that carries the same items — counting both would
// double-count, so they also get canceled = true (routes filter on canceled).
for (const [code, statusText] of [[15, 'canceled'], [16, 'combined']]) {
const ids = statusSweepRows.filter(r => r.order_status === code).map(r => r.order_id);
for (let i = 0; i < ids.length; i += 5000) {
const chunk = ids.slice(i, i + 5000);
const [sweepResult] = await localConnection.query(`
UPDATE orders
SET status = $1, canceled = true
WHERE order_number = ANY($2::text[])
AND (status IS DISTINCT FROM $1 OR canceled IS DISTINCT FROM true)
`, [statusText, chunk.map(String)]);
sweepUpdated += sweepResult.rowCount || 0;
}
}
// Update sync status with the watermark captured from MySQL BEFORE the
// source queries ran (see sourceNow above).
await localConnection.query(`
INSERT INTO sync_status (table_name, last_sync_timestamp)
VALUES ('orders', NOW())
VALUES ('orders', $1)
ON CONFLICT (table_name) DO UPDATE SET
last_sync_timestamp = NOW()
`);
last_sync_timestamp = $1
`, [sourceNow]);
// Cleanup temporary tables
await localConnection.query(`
DROP TABLE IF EXISTS temp_order_items;
DROP TABLE IF EXISTS temp_order_meta;
DROP TABLE IF EXISTS temp_order_discounts;
DROP TABLE IF EXISTS temp_order_taxes;
DROP TABLE IF EXISTS temp_order_costs;
DROP TABLE IF EXISTS temp_main_discounts;
DROP TABLE IF EXISTS temp_item_discounts;
DROP TABLE IF EXISTS temp_product_costs;
`);
// Commit final transaction
await localConnection.commit();
} catch (error) {
await localConnection.rollback();
throw error;
throw error;
}
if (staleItemsDeleted > 0 || sweepUpdated > 0) {
console.log(`Orders: reconciliation removed ${staleItemsDeleted} stale item rows, swept ${sweepUpdated} canceled/combined rows`);
}
return {
@@ -800,6 +803,8 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
totalImported: Math.floor(importedCount) || 0,
recordsAdded: parseInt(recordsAdded) || 0,
recordsUpdated: parseInt(recordsUpdated) || 0,
recordsDeleted: staleItemsDeleted,
statusSweepUpdated: sweepUpdated,
totalSkipped: skippedOrders.size || 0,
missingProducts: missingProducts.size || 0,
totalProcessed: orderItems.length, // Total order items in source
+132 -5
View File
@@ -622,6 +622,7 @@ async function materializeCalculations(prodConnection, localConnection, incremen
AND t.total_sold IS NOT DISTINCT FROM p.total_sold
AND t.date_online IS NOT DISTINCT FROM p.date_online
AND t.shop_score IS NOT DISTINCT FROM p.shop_score
AND t.categories IS NOT DISTINCT FROM p.categories
`);
// Get count of products that need updating
@@ -662,6 +663,11 @@ async function importProducts(prodConnection, localConnection, incrementalUpdate
}
}
// Capture the next watermark from MySQL's own clock BEFORE querying any data.
// Rows modified while the import runs stay above this watermark for the next
// incremental run (overlap re-imports are harmless upserts).
const [[{ source_now: sourceNow }]] = await prodConnection.query('SELECT NOW() as source_now');
// Start a transaction to ensure temporary tables persist
await localConnection.beginTransaction();
@@ -927,16 +933,22 @@ async function importProducts(prodConnection, localConnection, incrementalUpdate
// legacy PHP backend will stamp onto the PO line item.
await syncSupplierCosts(prodConnection, localConnection);
// Sync category assignments for ALL products. product_category_index has no
// stamp column, so category-only changes never bump any of the incremental
// WHERE timestamps — without this pass PG categories go permanently stale.
await syncProductCategories(prodConnection, localConnection);
// Commit the transaction
await localConnection.commit();
// Update sync status
// Update sync status with the watermark captured from MySQL BEFORE the
// source queries ran (see sourceNow above).
await localConnection.query(`
INSERT INTO sync_status (table_name, last_sync_timestamp)
VALUES ('products', NOW())
VALUES ('products', $1)
ON CONFLICT (table_name) DO UPDATE SET
last_sync_timestamp = NOW()
`);
last_sync_timestamp = $1
`, [sourceNow]);
return {
status: 'complete',
@@ -1028,11 +1040,126 @@ async function syncSupplierCosts(prodConnection, localConnection) {
return { updated };
}
// Full category-assignment sweep. The incremental product import keys on
// p.stamp / ci.stamp / price / b2b dates — none of which change when a product
// is recategorized in product_category_index (the table has no stamp column).
// This pass compares the canonical GROUP_CONCAT representation against
// products.categories and rewrites product_categories only for changed pids.
// Must run inside the caller's transaction (uses ON COMMIT DROP temp table).
async function syncProductCategories(prodConnection, localConnection) {
outputProgress({
status: "running",
operation: "Products import",
message: "Syncing category assignments"
});
// Same expression as the main import query so representations compare equal
// (GROUP_CONCAT(DISTINCT int) returns values numerically sorted).
const [rows] = await prodConnection.query(`
SELECT
p.pid,
GROUP_CONCAT(DISTINCT CASE
WHEN pc.cat_id IS NOT NULL
AND pc.type IN (10, 20, 11, 21, 12, 13)
AND pci.cat_id NOT IN (16, 17)
THEN pci.cat_id
END) as category_ids
FROM products p
LEFT JOIN product_category_index pci ON p.pid = pci.pid
LEFT JOIN product_categories pc ON pci.cat_id = pc.cat_id
GROUP BY p.pid
`);
if (!rows || rows.length === 0) {
return { updated: 0 };
}
await localConnection.query(`
CREATE TEMP TABLE temp_category_sync (
pid BIGINT PRIMARY KEY,
categories TEXT
) ON COMMIT DROP
`);
const CHUNK = 5000;
for (let i = 0; i < rows.length; i += CHUNK) {
const batch = rows.slice(i, i + CHUNK);
const pids = batch.map(r => r.pid);
const cats = batch.map(r => r.category_ids);
await localConnection.query(
`INSERT INTO temp_category_sync (pid, categories)
SELECT * FROM UNNEST($1::bigint[], $2::text[])
ON CONFLICT (pid) DO NOTHING`,
[pids, cats]
);
}
// Which existing products actually changed?
const [changed] = await localConnection.query(`
SELECT t.pid, t.categories
FROM temp_category_sync t
JOIN products p ON p.pid = t.pid
WHERE t.categories IS DISTINCT FROM p.categories
`);
if (changed.rows.length === 0) {
return { updated: 0 };
}
await localConnection.query(`
UPDATE products p
SET categories = t.categories
FROM temp_category_sync t
WHERE p.pid = t.pid
AND t.categories IS DISTINCT FROM p.categories
`);
// Rewrite the relationship rows for changed products only
const REL_CHUNK = 1000;
for (let i = 0; i < changed.rows.length; i += REL_CHUNK) {
const batch = changed.rows.slice(i, i + REL_CHUNK);
const pids = batch.map(r => r.pid);
await localConnection.query(
'DELETE FROM product_categories WHERE pid = ANY($1)',
[pids]
);
const relPids = [];
const relCats = [];
for (const row of batch) {
if (!row.categories) continue;
for (const catId of row.categories.split(',')) {
if (catId && catId.trim()) {
relPids.push(row.pid);
relCats.push(parseInt(catId.trim(), 10));
}
}
}
if (relPids.length > 0) {
await localConnection.query(`
INSERT INTO product_categories (pid, cat_id)
SELECT * FROM UNNEST($1::bigint[], $2::int[])
ON CONFLICT (pid, cat_id) DO NOTHING
`, [relPids, relCats]);
}
}
outputProgress({
status: "running",
operation: "Products import",
message: `Category assignments updated for ${changed.rows.length} products`
});
return { updated: changed.rows.length };
}
module.exports = {
importProducts,
importMissingProducts,
setupTemporaryTables,
cleanupTemporaryTables,
materializeCalculations,
syncSupplierCosts
syncSupplierCosts,
syncProductCategories
};
@@ -72,6 +72,11 @@ async function importPurchaseOrders(prodConnection, localConnection, incremental
console.log('Purchase Orders: Using last sync time:', lastSyncTime, '(adjusted:', mysqlSyncTime, ')');
// Capture the next watermark from MySQL's own clock BEFORE querying any data.
// Rows modified while the import runs stay above this watermark for the next
// incremental run (overlap re-imports are harmless upserts).
const [[{ source_now: sourceNow }]] = await prodConnection.query('SELECT NOW() as source_now');
// Create temp tables for processing
await localConnection.query(`
DROP TABLE IF EXISTS temp_purchase_orders;
@@ -267,13 +272,16 @@ async function importPurchaseOrders(prodConnection, localConnection, incremental
if (totalPOs === 0) {
console.log('No purchase orders to process, skipping PO import step');
} else {
// Fetch and process POs in batches
let offset = 0;
// Fetch and process POs in batches using keyset pagination on po_id.
// LIMIT/OFFSET over a date_updated predicate silently skips rows when
// concurrent updates shift rows between pages.
let processedPOCount = 0;
let lastPoId = 0;
let allPOsProcessed = false;
while (!allPOsProcessed) {
const [poList] = await prodConnection.query(`
SELECT
SELECT
p.po_id,
p.supplier_id,
s.companyname AS vendor,
@@ -286,21 +294,23 @@ async function importPurchaseOrders(prodConnection, localConnection, incremental
FROM po p
LEFT JOIN suppliers s ON p.supplier_id = s.supplierid
WHERE p.date_created >= DATE_SUB(CURRENT_DATE, INTERVAL ${yearInterval} YEAR)
AND p.po_id > ?
${incrementalUpdate ? `
AND (
p.date_updated > ?
OR p.date_ordered > ?
p.date_updated > ?
OR p.date_ordered > ?
OR p.date_estin > ?
)
` : ''}
ORDER BY p.po_id
LIMIT ${PO_BATCH_SIZE} OFFSET ${offset}
`, incrementalUpdate ? [mysqlSyncTime, mysqlSyncTime, mysqlSyncTime] : []);
LIMIT ${PO_BATCH_SIZE}
`, incrementalUpdate ? [lastPoId, mysqlSyncTime, mysqlSyncTime, mysqlSyncTime] : [lastPoId]);
if (poList.length === 0) {
allPOsProcessed = true;
break;
}
lastPoId = poList[poList.length - 1].po_id;
// Get products for these POs
const poIds = poList.map(po => po.po_id);
@@ -332,7 +342,11 @@ async function importPurchaseOrders(prodConnection, localConnection, incremental
vendor: po.vendor || 'Unknown Vendor',
date: validateDate(po.date_ordered) || validateDate(po.date_created),
expected_date: validateDate(po.date_estin),
status: poStatusMap[po.status] || 'created',
// Unknown codes get a sentinel rather than 'created': defaulting an
// unknown cancel-like code to an OPEN status would inflate on-order
// FIFO (the metrics CTEs whitelist known-open statuses, so a sentinel
// is simply ignored there).
status: poStatusMap[po.status] || `unknown_${po.status}`,
notes: po.notes || '',
long_note: po.long_note || '',
ordered: product.qty_each,
@@ -393,20 +407,20 @@ async function importPurchaseOrders(prodConnection, localConnection, incremental
`, values);
}
offset += poList.length;
processedPOCount += poList.length;
totalProcessed += completePOs.length;
outputProgress({
status: "running",
operation: "Purchase orders import",
message: `Processed ${offset} of ${totalPOs} purchase orders (${totalProcessed} line items)`,
current: offset,
message: `Processed ${processedPOCount} of ${totalPOs} purchase orders (${totalProcessed} line items)`,
current: processedPOCount,
total: totalPOs,
elapsed: formatElapsedTime(startTime),
remaining: estimateRemaining(startTime, offset, totalPOs),
rate: calculateRate(startTime, offset)
remaining: estimateRemaining(startTime, processedPOCount, totalPOs),
rate: calculateRate(startTime, processedPOCount)
});
if (poList.length < PO_BATCH_SIZE) {
allPOsProcessed = true;
}
@@ -439,13 +453,14 @@ async function importPurchaseOrders(prodConnection, localConnection, incremental
if (totalReceivings === 0) {
console.log('No receivings to process, skipping receivings import step');
} else {
// Fetch and process receivings in batches
offset = 0; // Reset offset for receivings
// Fetch and process receivings in batches (keyset pagination, see POs above)
let processedReceivingCount = 0;
let lastReceivingId = 0;
let allReceivingsProcessed = false;
while (!allReceivingsProcessed) {
const [receivingList] = await prodConnection.query(`
SELECT
SELECT
r.receiving_id,
r.supplier_id,
r.status,
@@ -459,6 +474,7 @@ async function importPurchaseOrders(prodConnection, localConnection, incremental
r.date_checked
FROM receivings r
WHERE r.date_created >= DATE_SUB(CURRENT_DATE, INTERVAL ${yearInterval} YEAR)
AND r.receiving_id > ?
${incrementalUpdate ? `
AND (
r.date_updated > ?
@@ -466,13 +482,14 @@ async function importPurchaseOrders(prodConnection, localConnection, incremental
)
` : ''}
ORDER BY r.receiving_id
LIMIT ${PO_BATCH_SIZE} OFFSET ${offset}
`, incrementalUpdate ? [mysqlSyncTime, mysqlSyncTime] : []);
LIMIT ${PO_BATCH_SIZE}
`, incrementalUpdate ? [lastReceivingId, mysqlSyncTime, mysqlSyncTime] : [lastReceivingId]);
if (receivingList.length === 0) {
allReceivingsProcessed = true;
break;
}
lastReceivingId = receivingList[receivingList.length - 1].receiving_id;
// Get products for these receivings
const receivingIds = receivingList.map(r => r.receiving_id);
@@ -545,7 +562,8 @@ async function importPurchaseOrders(prodConnection, localConnection, incremental
received_date: validateDate(product.received_date) || validateDate(product.receiving_created_date),
receiving_created_date: validateDate(product.receiving_created_date),
supplier_id: receiving.supplier_id,
status: receivingStatusMap[receiving.status] || 'created'
// Sentinel for unknown codes — see PO status mapping note above
status: receivingStatusMap[receiving.status] || `unknown_${receiving.status}`
});
}
@@ -600,18 +618,18 @@ async function importPurchaseOrders(prodConnection, localConnection, incremental
`, values);
}
offset += receivingList.length;
processedReceivingCount += receivingList.length;
totalProcessed += completeReceivings.length;
outputProgress({
status: "running",
operation: "Purchase orders import",
message: `Processed ${offset} of ${totalReceivings} receivings (${totalProcessed} line items total)`,
current: offset,
message: `Processed ${processedReceivingCount} of ${totalReceivings} receivings (${totalProcessed} line items total)`,
current: processedReceivingCount,
total: totalReceivings,
elapsed: formatElapsedTime(startTime),
remaining: estimateRemaining(startTime, offset, totalReceivings),
rate: calculateRate(startTime, offset)
remaining: estimateRemaining(startTime, processedReceivingCount, totalReceivings),
rate: calculateRate(startTime, processedReceivingCount)
});
if (receivingList.length < PO_BATCH_SIZE) {
@@ -829,13 +847,14 @@ async function importPurchaseOrders(prodConnection, localConnection, incremental
receivingRecordsAdded = receivingsResult.rows.filter(r => r.inserted).length;
receivingRecordsUpdated = receivingsResult.rows.filter(r => !r.inserted).length;
// Update sync status
// Update sync status with the watermark captured from MySQL BEFORE the
// source queries ran (see sourceNow above).
await localConnection.query(`
INSERT INTO sync_status (table_name, last_sync_timestamp)
VALUES ('purchase_orders', NOW())
VALUES ('purchase_orders', $1)
ON CONFLICT (table_name) DO UPDATE SET
last_sync_timestamp = NOW()
`);
last_sync_timestamp = $1
`, [sourceNow]);
// Clean up temporary tables
await localConnection.query(`
@@ -151,7 +151,10 @@ async function importStockSnapshots(prodConnection, localConnection, incremental
recordsAdded += batch.length;
} catch (err) {
// Fail the step: the next incremental starts at MAX(snapshot_date), so a
// swallowed batch error would leave a permanent hole that is never revisited.
console.error(`Error inserting batch at offset ${i} (date range ending ${currentDate}):`, err.message);
throw err;
}
}
@@ -165,7 +168,7 @@ async function importStockSnapshots(prodConnection, localConnection, incremental
current: processedRows,
total: totalRows,
elapsed: formatElapsedTime(startTime),
rate: calculateRate(processedRows, startTime)
rate: calculateRate(startTime, processedRows)
});
}
@@ -10,7 +10,7 @@ DECLARE
_date DATE;
_count INT;
_total_records INT := 0;
_begin_date DATE := (SELECT MIN(date)::date FROM orders WHERE date >= '2020-01-01'); -- Starting point: captures all historical order data
_begin_date DATE := (SELECT MIN((date AT TIME ZONE 'America/Chicago'))::date FROM orders WHERE date >= '2020-01-01'); -- Starting point: captures all historical order data (business days, Central time)
_end_date DATE := CURRENT_DATE;
BEGIN
RAISE NOTICE 'Beginning daily snapshots rebuild from % to %. Starting at %', _begin_date, _end_date, _start_time;
@@ -32,26 +32,34 @@ BEGIN
p.sku,
-- Count orders to ensure we only include products with real activity
COUNT(o.id) as order_count,
-- Aggregate Sales (Quantity > 0, Status not Canceled/Returned)
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned') THEN o.quantity ELSE 0 END), 0) AS units_sold,
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned') THEN o.price * o.quantity ELSE 0 END), 0.00) AS gross_revenue_unadjusted,
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned') THEN o.discount ELSE 0 END), 0.00) AS discounts,
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned') THEN
-- Aggregate Sales (Quantity > 0, Status not Canceled/Returned/Combined)
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned', 'combined') THEN o.quantity ELSE 0 END), 0) AS units_sold,
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned', 'combined') THEN o.price * o.quantity ELSE 0 END), 0.00) AS gross_revenue_unadjusted,
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned', 'combined') THEN o.discount ELSE 0 END), 0.00) AS discounts,
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned', 'combined') THEN
COALESCE(
o.costeach,
get_weighted_avg_cost(p.pid, o.date::date),
get_weighted_avg_cost(p.pid, (o.date AT TIME ZONE 'America/Chicago')::date),
p.cost_price
) * o.quantity
ELSE 0 END), 0.00) AS cogs,
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned') THEN p.regular_price * o.quantity ELSE 0 END), 0.00) AS gross_regular_revenue,
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned', 'combined') THEN p.regular_price * o.quantity ELSE 0 END), 0.00) AS gross_regular_revenue,
-- Aggregate Returns (Quantity < 0 or Status = Returned)
COALESCE(SUM(CASE WHEN o.quantity < 0 OR COALESCE(o.status, 'pending') = 'returned' THEN ABS(o.quantity) ELSE 0 END), 0) AS units_returned,
COALESCE(SUM(CASE WHEN o.quantity < 0 OR COALESCE(o.status, 'pending') = 'returned' THEN o.price * ABS(o.quantity) ELSE 0 END), 0.00) AS returns_revenue
COALESCE(SUM(CASE WHEN o.quantity < 0 OR COALESCE(o.status, 'pending') = 'returned' THEN o.price * ABS(o.quantity) ELSE 0 END), 0.00) AS returns_revenue,
-- Returns COGS: cost of returned goods offsets sales COGS
COALESCE(SUM(CASE WHEN o.quantity < 0 OR COALESCE(o.status, 'pending') = 'returned' THEN
COALESCE(
o.costeach,
get_weighted_avg_cost(p.pid, (o.date AT TIME ZONE 'America/Chicago')::date),
p.cost_price
) * ABS(o.quantity)
ELSE 0 END), 0.00) AS returns_cogs
FROM public.products p
LEFT JOIN public.orders o
ON p.pid = o.pid
AND o.date::date = _date
AND (o.date AT TIME ZONE 'America/Chicago')::date = _date -- business day (Central)
GROUP BY p.pid, p.sku
HAVING COUNT(o.id) > 0 -- Only include products with actual orders for this date
),
@@ -65,7 +73,7 @@ BEGIN
-- Calculate received cost for this day
SUM(r.qty_each * r.cost_each) AS cost_received
FROM public.receivings r
WHERE r.received_date::date = _date
WHERE (r.received_date AT TIME ZONE 'America/Chicago')::date = _date
GROUP BY r.pid
HAVING COUNT(DISTINCT r.receiving_id) > 0 OR SUM(r.qty_each) > 0
),
@@ -120,9 +128,9 @@ BEGIN
COALESCE(sd.discounts, 0.00),
COALESCE(sd.returns_revenue, 0.00),
COALESCE(sd.gross_revenue_unadjusted, 0.00) - COALESCE(sd.discounts, 0.00) - COALESCE(sd.returns_revenue, 0.00) AS net_revenue,
COALESCE(sd.cogs, 0.00),
COALESCE(sd.cogs, 0.00) - COALESCE(sd.returns_cogs, 0.00) AS cogs, -- net of returned goods' cost
COALESCE(sd.gross_regular_revenue, 0.00),
(COALESCE(sd.gross_revenue_unadjusted, 0.00) - COALESCE(sd.discounts, 0.00) - COALESCE(sd.returns_revenue, 0.00)) - COALESCE(sd.cogs, 0.00) AS profit,
(COALESCE(sd.gross_revenue_unadjusted, 0.00) - COALESCE(sd.discounts, 0.00) - COALESCE(sd.returns_revenue, 0.00)) - (COALESCE(sd.cogs, 0.00) - COALESCE(sd.returns_cogs, 0.00)) AS profit,
-- Receiving metrics
COALESCE(rd.units_received, 0),
COALESCE(rd.cost_received, 0.00),
@@ -123,7 +123,10 @@ BEGIN
brand_metrics.current_stock_units IS DISTINCT FROM EXCLUDED.current_stock_units OR
brand_metrics.sales_30d IS DISTINCT FROM EXCLUDED.sales_30d OR
brand_metrics.revenue_30d IS DISTINCT FROM EXCLUDED.revenue_30d OR
brand_metrics.lifetime_sales IS DISTINCT FROM EXCLUDED.lifetime_sales;
brand_metrics.lifetime_sales IS DISTINCT FROM EXCLUDED.lifetime_sales OR
-- Cost revisions can change profit/cogs with unchanged sales/revenue
brand_metrics.profit_30d IS DISTINCT FROM EXCLUDED.profit_30d OR
brand_metrics.cogs_30d IS DISTINCT FROM EXCLUDED.cogs_30d;
-- Update calculate_status
INSERT INTO public.calculate_status (module_name, last_calculation_timestamp)
@@ -23,17 +23,19 @@ BEGIN
SUM(pm.current_stock) AS current_stock_units,
SUM(pm.current_stock_cost) AS current_stock_cost,
SUM(pm.current_stock_retail) AS current_stock_retail,
-- Sales metrics with proper filtering
-- Sales metrics — revenue uses plain COALESCE (matching brand/vendor);
-- a positive-only revenue filter while cogs/profit sum everything put
-- the margin numerator and denominator on different row populations.
SUM(CASE WHEN pm.sales_7d > 0 THEN pm.sales_7d ELSE 0 END) AS sales_7d,
SUM(CASE WHEN pm.revenue_7d > 0 THEN pm.revenue_7d ELSE 0 END) AS revenue_7d,
SUM(COALESCE(pm.revenue_7d, 0)) AS revenue_7d,
SUM(CASE WHEN pm.sales_30d > 0 THEN pm.sales_30d ELSE 0 END) AS sales_30d,
SUM(CASE WHEN pm.revenue_30d > 0 THEN pm.revenue_30d ELSE 0 END) AS revenue_30d,
SUM(COALESCE(pm.revenue_30d, 0)) AS revenue_30d,
SUM(COALESCE(pm.cogs_30d, 0)) AS cogs_30d,
SUM(COALESCE(pm.profit_30d, 0)) AS profit_30d,
SUM(CASE WHEN pm.sales_365d > 0 THEN pm.sales_365d ELSE 0 END) AS sales_365d,
SUM(CASE WHEN pm.revenue_365d > 0 THEN pm.revenue_365d ELSE 0 END) AS revenue_365d,
SUM(COALESCE(pm.revenue_365d, 0)) AS revenue_365d,
SUM(CASE WHEN pm.lifetime_sales > 0 THEN pm.lifetime_sales ELSE 0 END) AS lifetime_sales,
SUM(CASE WHEN pm.lifetime_revenue > 0 THEN pm.lifetime_revenue ELSE 0 END) AS lifetime_revenue
SUM(COALESCE(pm.lifetime_revenue, 0)) AS lifetime_revenue
FROM public.product_categories pc
JOIN public.product_metrics pm ON pc.pid = pm.pid
GROUP BY pc.cat_id
@@ -62,15 +64,15 @@ BEGIN
SUM(pm.current_stock_cost) AS current_stock_cost,
SUM(pm.current_stock_retail) AS current_stock_retail,
SUM(CASE WHEN pm.sales_7d > 0 THEN pm.sales_7d ELSE 0 END) AS sales_7d,
SUM(CASE WHEN pm.revenue_7d > 0 THEN pm.revenue_7d ELSE 0 END) AS revenue_7d,
SUM(COALESCE(pm.revenue_7d, 0)) AS revenue_7d,
SUM(CASE WHEN pm.sales_30d > 0 THEN pm.sales_30d ELSE 0 END) AS sales_30d,
SUM(CASE WHEN pm.revenue_30d > 0 THEN pm.revenue_30d ELSE 0 END) AS revenue_30d,
SUM(COALESCE(pm.revenue_30d, 0)) AS revenue_30d,
SUM(COALESCE(pm.cogs_30d, 0)) AS cogs_30d,
SUM(COALESCE(pm.profit_30d, 0)) AS profit_30d,
SUM(CASE WHEN pm.sales_365d > 0 THEN pm.sales_365d ELSE 0 END) AS sales_365d,
SUM(CASE WHEN pm.revenue_365d > 0 THEN pm.revenue_365d ELSE 0 END) AS revenue_365d,
SUM(COALESCE(pm.revenue_365d, 0)) AS revenue_365d,
SUM(CASE WHEN pm.lifetime_sales > 0 THEN pm.lifetime_sales ELSE 0 END) AS lifetime_sales,
SUM(CASE WHEN pm.lifetime_revenue > 0 THEN pm.lifetime_revenue ELSE 0 END) AS lifetime_revenue
SUM(COALESCE(pm.lifetime_revenue, 0)) AS lifetime_revenue
FROM CategoryProducts cp
JOIN public.product_metrics pm ON cp.pid = pm.pid
GROUP BY cp.ancestor_cat_id
@@ -200,7 +202,10 @@ BEGIN
category_metrics.revenue_30d IS DISTINCT FROM EXCLUDED.revenue_30d OR
category_metrics.lifetime_sales IS DISTINCT FROM EXCLUDED.lifetime_sales OR
category_metrics.direct_product_count IS DISTINCT FROM EXCLUDED.direct_product_count OR
category_metrics.direct_sales_30d IS DISTINCT FROM EXCLUDED.direct_sales_30d;
category_metrics.direct_sales_30d IS DISTINCT FROM EXCLUDED.direct_sales_30d OR
-- Cost revisions can change profit/cogs with unchanged sales/revenue
category_metrics.profit_30d IS DISTINCT FROM EXCLUDED.profit_30d OR
category_metrics.cogs_30d IS DISTINCT FROM EXCLUDED.cogs_30d;
-- Update calculate_status
INSERT INTO public.calculate_status (module_name, last_calculation_timestamp)
@@ -60,26 +60,31 @@ BEGIN
GROUP BY p.vendor
),
VendorPOAggregates AS (
-- Aggregate PO related stats including lead time calculated from POs to receivings
-- Lead time per PO line = days to its FIRST receiving from the same supplier
-- (within 180 days), then averaged per vendor. Joining each PO line to EVERY
-- later receiving overstated lead time and weighted it toward busy products.
-- Same shape as the per-product calc in update_periodic_metrics.sql.
SELECT
po.vendor,
COUNT(DISTINCT po.po_id) AS po_count_365d,
-- Calculate lead time by averaging the days between PO date and receiving date
AVG(GREATEST(1, CASE
WHEN r.received_date IS NOT NULL AND po.date IS NOT NULL
THEN (r.received_date::date - po.date::date)
ELSE NULL
END))::int AS avg_lead_time_days_hist -- Avg lead time from HISTORICAL received POs
FROM public.purchase_orders po
-- Join to receivings table to find when items were received
LEFT JOIN public.receivings r ON r.pid = po.pid AND r.supplier_id = po.supplier_id
WHERE po.vendor IS NOT NULL AND po.vendor <> ''
AND po.date >= CURRENT_DATE - INTERVAL '1 year' -- Look at POs created in the last year
AND po.status = 'done' -- Only calculate lead time on completed POs
AND r.received_date IS NOT NULL
AND po.date IS NOT NULL
AND r.received_date >= po.date
GROUP BY po.vendor
vendor,
COUNT(DISTINCT po_id) AS po_count_365d,
ROUND(AVG(GREATEST(1, first_receive_date - po_date)))::int AS avg_lead_time_days_hist
FROM (
SELECT
po.vendor,
po.po_id,
po.pid,
po.date::date AS po_date,
MIN(r.received_date::date) AS first_receive_date
FROM public.purchase_orders po
JOIN public.receivings r ON r.pid = po.pid AND r.supplier_id = po.supplier_id
AND r.received_date >= po.date
AND r.received_date <= po.date + INTERVAL '180 days'
WHERE po.status = 'done'
AND po.date >= CURRENT_DATE - INTERVAL '1 year'
AND po.vendor IS NOT NULL AND po.vendor <> ''
GROUP BY po.vendor, po.po_id, po.pid, po.date
) po_first_receiving
GROUP BY vendor
),
AllVendors AS (
-- Ensure all vendors from products table are included
@@ -154,7 +159,11 @@ BEGIN
vendor_metrics.on_order_units IS DISTINCT FROM EXCLUDED.on_order_units OR
vendor_metrics.sales_30d IS DISTINCT FROM EXCLUDED.sales_30d OR
vendor_metrics.revenue_30d IS DISTINCT FROM EXCLUDED.revenue_30d OR
vendor_metrics.lifetime_sales IS DISTINCT FROM EXCLUDED.lifetime_sales;
vendor_metrics.lifetime_sales IS DISTINCT FROM EXCLUDED.lifetime_sales OR
-- Cost revisions can change profit/cogs with unchanged sales/revenue
vendor_metrics.profit_30d IS DISTINCT FROM EXCLUDED.profit_30d OR
vendor_metrics.cogs_30d IS DISTINCT FROM EXCLUDED.cogs_30d OR
vendor_metrics.avg_lead_time_days IS DISTINCT FROM EXCLUDED.avg_lead_time_days;
-- Update calculate_status
INSERT INTO public.calculate_status (module_name, last_calculation_timestamp)
@@ -0,0 +1,69 @@
-- Migration 003: Item-level promo discounts + business-day (America/Chicago) bucketing
-- (applied 2026-06-11, together with the IMPORT_METRICS_FIX_PLAN.md batch)
--
-- PROBLEM 1 — dropped item-level promo discounts (~$26K / 30 days):
-- orders.js applied item-level discounts from order_discount_items only when the
-- parent order_discounts row had discount_amount_subtotal > 0:
-- SUM(CASE WHEN COALESCE(md.discount_amount_subtotal, 0) > 0 THEN id.amount ELSE 0 END)
-- In the PHP source, item-level promo discounts (which = 2) are applied to the order
-- total SEPARATELY from summary_discount_subtotal, so the gate zeroed essentially all
-- of them (90d live check: of 10,010 type-10 promos, 8,070 had item rows but only 8 had
-- discount_amount_subtotal > 0). Net effect: orders.discount understated, net_revenue /
-- profit_30d / margin_30d overstated by ~10% of revenue, discounts_30d ~3x understated.
--
-- FIX (orders.js): fetch only order_discount_items rows with which = 2 (which = 1 rows
-- are prices of free promo-added items, which = 3 are usage records), sum them
-- unconditionally, and clamp each sale line's total discount to price * quantity.
-- temp_main_discounts / temp_order_discounts staging removed (unused after the fix).
--
-- PROBLEM 2 — Europe/Berlin day bucketing:
-- orders.date is timestamptz and the PG server timezone is Europe/Berlin, so ::date
-- casts shifted every order placed after ~5 PM Central onto the NEXT calendar day in
-- daily_product_snapshots (and skewed yesterday_sales, DOW patterns, forecast accuracy).
--
-- FIX (update_daily_snapshots.sql, backfill/rebuild_daily_snapshots.sql,
-- update_product_metrics.sql): every day-bucketing cast is now
-- (ts AT TIME ZONE 'America/Chicago')::date
-- Supporting expression indexes:
-- CREATE INDEX idx_orders_date_chicago ON orders (((date AT TIME ZONE 'America/Chicago')::date));
-- CREATE INDEX idx_receivings_received_chicago ON receivings (((received_date AT TIME ZONE 'America/Chicago')::date));
--
-- ALSO IN THIS BATCH (same re-import/rebuild):
-- * 'combined' order status (code 16) excluded from all sales aggregates, and a sweep
-- in orders.js marks canceled/combined source orders (canceled = true) even though
-- combine_orders zeroes date_placed (Fixes 4/5).
-- * Returns now subtract COGS (returns_cogs) in daily snapshots (Fix 8).
-- * return_rate_30d = returns / sales (Fix 9); gmroi_30d annualized ×12.17 (Fix 10).
-- * stockout/avg-stock/service-level derived from stock_snapshots presence (Fix 7).
--
-- REQUIRED ACTION (cannot be fixed by SQL alone — discount values are baked into rows):
-- 1. Deploy updated orders.js + snapshot SQL files.
-- 2. Pause the recurring import: touch inventory-server/.pause-auto-update
-- 3. FULL orders re-import: INCREMENTAL_UPDATE=false node scripts/import-from-prod.js
-- 4. Rebuild snapshots: psql -f scripts/metrics-new/backfill/rebuild_daily_snapshots.sql
-- 5. Recalculate metrics: node scripts/calculate-metrics-new.js
-- 6. Resume: rm inventory-server/.pause-auto-update
--
-- EXPECTED AFTER RE-IMPORT: margin_30d down ~8-10 points (real, not a data incident),
-- discounts_30d ~3x up, daily sales curves shifted onto correct business days.
--
-- VERIFICATION:
-- (a) PG SUM(discount) over a 30-day window should approximate MySQL
-- Σ summary_discount_subtotal (prorated) + Σ order_discount_items.amount (which=2)
-- over the same orders.
-- (b) Per-day units in daily_product_snapshots should match MySQL
-- SELECT date_placed_onlydate, SUM(qty_ordered) FROM order_items JOIN _order ...
-- WHERE order_status >= 20 GROUP BY 1 (MySQL stores Central days).
-- (c) Migration 002 regression check (discount double-counting) still holds:
SELECT
o.pid,
o.order_number,
o.price,
o.quantity,
o.discount,
(o.price * o.quantity - o.discount) as net_revenue
FROM orders o
WHERE o.pid IN (624756, 614513)
ORDER BY o.date DESC
LIMIT 10;
-- Expected: discount 0 (or genuine promo amount) for regular sales; net close to gross.
@@ -0,0 +1,9 @@
-- Migration 004: Map order status codes 45 and 67 to text
--
-- Follow-up to 001_map_order_statuses.sql: the orders.js orderStatusMap lacked
-- codes 45 (payment_pending) and 67 (remote_send), so any such orders imported
-- as numeric strings '45' / '67'. orders.js now maps them; this updates any
-- existing rows (a full re-import also fixes them — safe to run either way).
UPDATE orders SET status = 'payment_pending' WHERE status = '45';
UPDATE orders SET status = 'remote_send' WHERE status = '67';
@@ -39,50 +39,68 @@ BEGIN
-- 2. Stale detection: existing snapshots where aggregates don't match source data
-- (catches backfilled imports that arrived after snapshot was calculated)
-- 3. Recent recheck: last N days always reprocessed (picks up new orders, corrections)
-- NOTE: all order/receiving timestamps are bucketed into business days using
-- America/Chicago. The PG server timezone is Europe/Berlin, so a bare ::date
-- cast would shift every evening order onto the next day.
FOR _target_date IN
SELECT d FROM (
-- Gap fill: find dates with activity but missing snapshots
SELECT activity_dates.d
FROM (
SELECT DISTINCT date::date AS d FROM public.orders
WHERE date::date >= _backfill_start AND date::date < CURRENT_DATE - _recent_recheck_days
SELECT DISTINCT (date AT TIME ZONE 'America/Chicago')::date AS d FROM public.orders
WHERE (date AT TIME ZONE 'America/Chicago')::date >= _backfill_start
AND (date AT TIME ZONE 'America/Chicago')::date < CURRENT_DATE - _recent_recheck_days
UNION
SELECT DISTINCT received_date::date AS d FROM public.receivings
WHERE received_date::date >= _backfill_start AND received_date::date < CURRENT_DATE - _recent_recheck_days
SELECT DISTINCT (received_date AT TIME ZONE 'America/Chicago')::date AS d FROM public.receivings
WHERE (received_date AT TIME ZONE 'America/Chicago')::date >= _backfill_start
AND (received_date AT TIME ZONE 'America/Chicago')::date < CURRENT_DATE - _recent_recheck_days
) activity_dates
WHERE NOT EXISTS (
SELECT 1 FROM public.daily_product_snapshots dps WHERE dps.snapshot_date = activity_dates.d
)
UNION
-- Stale detection: compare snapshot aggregates against source tables
-- (must bucket identically to SalesData/ReceivingData or every day
-- looks permanently stale)
SELECT snap_agg.snapshot_date AS d
FROM (
SELECT snapshot_date,
COALESCE(SUM(units_received), 0)::bigint AS snap_received,
COALESCE(SUM(units_sold), 0)::bigint AS snap_sold
COALESCE(SUM(units_sold), 0)::bigint AS snap_sold,
ROUND(COALESCE(SUM(net_revenue), 0), 2) AS snap_net_revenue
FROM public.daily_product_snapshots
WHERE snapshot_date >= _backfill_start
AND snapshot_date < CURRENT_DATE - _recent_recheck_days
GROUP BY snapshot_date
) snap_agg
LEFT JOIN (
SELECT received_date::date AS d, SUM(qty_each)::bigint AS actual_received
SELECT (received_date AT TIME ZONE 'America/Chicago')::date AS d, SUM(qty_each)::bigint AS actual_received
FROM public.receivings
WHERE received_date::date >= _backfill_start
AND received_date::date < CURRENT_DATE - _recent_recheck_days
GROUP BY received_date::date
WHERE (received_date AT TIME ZONE 'America/Chicago')::date >= _backfill_start
AND (received_date AT TIME ZONE 'America/Chicago')::date < CURRENT_DATE - _recent_recheck_days
GROUP BY 1
) recv_agg ON snap_agg.snapshot_date = recv_agg.d
LEFT JOIN (
SELECT date::date AS d,
SUM(CASE WHEN quantity > 0 AND COALESCE(status, 'pending') NOT IN ('canceled', 'returned')
THEN quantity ELSE 0 END)::bigint AS actual_sold
SELECT (date AT TIME ZONE 'America/Chicago')::date AS d,
SUM(CASE WHEN quantity > 0 AND COALESCE(status, 'pending') NOT IN ('canceled', 'returned', 'combined')
THEN quantity ELSE 0 END)::bigint AS actual_sold,
-- Mirrors SalesData's net_revenue (gross - discounts - returns)
-- so price/discount corrections older than the recheck window
-- get repaired, not just unit-count changes.
ROUND(
SUM(CASE WHEN quantity > 0 AND COALESCE(status, 'pending') NOT IN ('canceled', 'returned', 'combined')
THEN price * quantity - discount ELSE 0 END)
- SUM(CASE WHEN quantity < 0 OR COALESCE(status, 'pending') = 'returned'
THEN price * ABS(quantity) ELSE 0 END)
, 2) AS actual_net_revenue
FROM public.orders
WHERE date::date >= _backfill_start
AND date::date < CURRENT_DATE - _recent_recheck_days
GROUP BY date::date
WHERE (date AT TIME ZONE 'America/Chicago')::date >= _backfill_start
AND (date AT TIME ZONE 'America/Chicago')::date < CURRENT_DATE - _recent_recheck_days
GROUP BY 1
) orders_agg ON snap_agg.snapshot_date = orders_agg.d
WHERE snap_agg.snap_received != COALESCE(recv_agg.actual_received, 0)
OR snap_agg.snap_sold != COALESCE(orders_agg.actual_sold, 0)
OR snap_agg.snap_net_revenue != ROUND(COALESCE(orders_agg.actual_net_revenue, 0), 2)
UNION
-- Recent days: always reprocess
SELECT d::date
@@ -116,26 +134,36 @@ BEGIN
p.sku,
-- Track number of orders to ensure we have real data
COUNT(o.id) as order_count,
-- Aggregate Sales (Quantity > 0, Status not Canceled/Returned)
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned') THEN o.quantity ELSE 0 END), 0) AS units_sold,
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned') THEN o.price * o.quantity ELSE 0 END), 0.00) AS gross_revenue_unadjusted, -- Before discount
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned') THEN o.discount ELSE 0 END), 0.00) AS discounts,
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned') THEN
-- Aggregate Sales (Quantity > 0, Status not Canceled/Returned/Combined)
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned', 'combined') THEN o.quantity ELSE 0 END), 0) AS units_sold,
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned', 'combined') THEN o.price * o.quantity ELSE 0 END), 0.00) AS gross_revenue_unadjusted, -- Before discount
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned', 'combined') THEN o.discount ELSE 0 END), 0.00) AS discounts,
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned', 'combined') THEN
COALESCE(
o.costeach, -- First use order-specific cost if available
get_weighted_avg_cost(p.pid, o.date::date), -- Then use weighted average cost
get_weighted_avg_cost(p.pid, (o.date AT TIME ZONE 'America/Chicago')::date), -- Then use weighted average cost
p.cost_price -- Final fallback to current cost
) * o.quantity
) * o.quantity
ELSE 0 END), 0.00) AS cogs,
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned') THEN p.regular_price * o.quantity ELSE 0 END), 0.00) AS gross_regular_revenue, -- Use current regular price for simplicity here
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned', 'combined') THEN p.regular_price * o.quantity ELSE 0 END), 0.00) AS gross_regular_revenue, -- Use current regular price for simplicity here
-- Aggregate Returns (Quantity < 0 or Status = Returned)
COALESCE(SUM(CASE WHEN o.quantity < 0 OR COALESCE(o.status, 'pending') = 'returned' THEN ABS(o.quantity) ELSE 0 END), 0) AS units_returned,
COALESCE(SUM(CASE WHEN o.quantity < 0 OR COALESCE(o.status, 'pending') = 'returned' THEN o.price * ABS(o.quantity) ELSE 0 END), 0.00) AS returns_revenue
COALESCE(SUM(CASE WHEN o.quantity < 0 OR COALESCE(o.status, 'pending') = 'returned' THEN o.price * ABS(o.quantity) ELSE 0 END), 0.00) AS returns_revenue,
-- Returns COGS: returned goods come back into stock, so their cost
-- offsets the sales COGS for the day (margin would otherwise be
-- understated in return-heavy periods).
COALESCE(SUM(CASE WHEN o.quantity < 0 OR COALESCE(o.status, 'pending') = 'returned' THEN
COALESCE(
o.costeach,
get_weighted_avg_cost(p.pid, (o.date AT TIME ZONE 'America/Chicago')::date),
p.cost_price
) * ABS(o.quantity)
ELSE 0 END), 0.00) AS returns_cogs
FROM public.products p -- Start from products to include those with no orders today
JOIN public.orders o -- Changed to INNER JOIN to only process products with orders
ON p.pid = o.pid
AND o.date::date = _target_date -- Cast to date to ensure compatibility regardless of original type
AND (o.date AT TIME ZONE 'America/Chicago')::date = _target_date -- Bucket by business day (Central)
GROUP BY p.pid, p.sku
-- No HAVING clause here - we always want to include all orders
),
@@ -149,7 +177,7 @@ BEGIN
-- Calculate the cost received (qty * cost)
SUM(r.qty_each * r.cost_each) AS cost_received
FROM public.receivings r
WHERE r.received_date::date = _target_date
WHERE (r.received_date AT TIME ZONE 'America/Chicago')::date = _target_date
-- Optional: Filter out canceled receivings if needed
-- AND r.status <> 'canceled'
GROUP BY r.pid
@@ -217,9 +245,9 @@ BEGIN
COALESCE(sd.discounts, 0.00),
COALESCE(sd.returns_revenue, 0.00),
COALESCE(sd.gross_revenue_unadjusted, 0.00) - COALESCE(sd.discounts, 0.00) - COALESCE(sd.returns_revenue, 0.00) AS net_revenue,
COALESCE(sd.cogs, 0.00),
COALESCE(sd.cogs, 0.00) - COALESCE(sd.returns_cogs, 0.00) AS cogs, -- net of returned goods' cost
COALESCE(sd.gross_regular_revenue, 0.00),
(COALESCE(sd.gross_revenue_unadjusted, 0.00) - COALESCE(sd.discounts, 0.00) - COALESCE(sd.returns_revenue, 0.00)) - COALESCE(sd.cogs, 0.00) AS profit,
(COALESCE(sd.gross_revenue_unadjusted, 0.00) - COALESCE(sd.discounts, 0.00) - COALESCE(sd.returns_revenue, 0.00)) - (COALESCE(sd.cogs, 0.00) - COALESCE(sd.returns_cogs, 0.00)) AS profit,
-- Receiving Metrics (From ReceivingData)
COALESCE(rd.units_received, 0),
COALESCE(rd.cost_received, 0.00),
@@ -131,18 +131,19 @@ BEGIN
HistoricalDates AS (
-- Note: Calculating these MIN/MAX values hourly can be slow on large tables.
-- Consider calculating periodically or storing on products if import can populate them.
-- Dates are bucketed in business time (America/Chicago) to match daily snapshots.
SELECT
p.pid,
MIN(o.date)::date AS date_first_sold,
MAX(o.date)::date AS max_order_date, -- Use MAX for potential recalc of date_last_sold
MIN((o.date AT TIME ZONE 'America/Chicago'))::date AS date_first_sold,
MAX((o.date AT TIME ZONE 'America/Chicago'))::date AS max_order_date, -- Use MAX for potential recalc of date_last_sold
-- For first received, use the new receivings table
MIN(r.received_date)::date AS date_first_received_calc,
MIN((r.received_date AT TIME ZONE 'America/Chicago'))::date AS date_first_received_calc,
-- For last received, use the new receivings table
MAX(r.received_date)::date AS date_last_received_calc
MAX((r.received_date AT TIME ZONE 'America/Chicago'))::date AS date_last_received_calc
FROM public.products p
LEFT JOIN public.orders o ON p.pid = o.pid AND o.quantity > 0 AND o.status NOT IN ('canceled', 'returned')
LEFT JOIN public.orders o ON p.pid = o.pid AND o.quantity > 0 AND o.status NOT IN ('canceled', 'returned', 'combined')
LEFT JOIN public.receivings r ON p.pid = r.pid
GROUP BY p.pid
),
@@ -174,17 +175,19 @@ BEGIN
SUM(CASE WHEN snapshot_date >= _current_date - INTERVAL '29 days' AND snapshot_date <= _current_date THEN discounts ELSE 0 END) AS discounts_30d,
SUM(CASE WHEN snapshot_date >= _current_date - INTERVAL '29 days' AND snapshot_date <= _current_date THEN gross_revenue ELSE 0 END) AS gross_revenue_30d,
SUM(CASE WHEN snapshot_date >= _current_date - INTERVAL '29 days' AND snapshot_date <= _current_date THEN gross_regular_revenue ELSE 0 END) AS gross_regular_revenue_30d,
SUM(CASE WHEN snapshot_date >= _current_date - INTERVAL '29 days' AND snapshot_date <= _current_date AND stockout_flag THEN 1 ELSE 0 END) AS stockout_days_30d,
-- NOTE: stockout days and avg stock units/cost now come from StockCoverage
-- (stock_snapshots has full daily coverage; these activity-only snapshots
-- only exist on days with sales/receivings, which made stockout_days ~0
-- exactly when stockouts mattered and biased stock averages upward).
SUM(CASE WHEN snapshot_date >= _current_date - INTERVAL '364 days' AND snapshot_date <= _current_date THEN units_sold ELSE 0 END) AS sales_365d,
SUM(CASE WHEN snapshot_date >= _current_date - INTERVAL '364 days' AND snapshot_date <= _current_date THEN net_revenue ELSE 0 END) AS revenue_365d,
SUM(CASE WHEN snapshot_date >= _current_date - INTERVAL '29 days' AND snapshot_date <= _current_date THEN units_received ELSE 0 END) AS received_qty_30d,
SUM(CASE WHEN snapshot_date >= _current_date - INTERVAL '29 days' AND snapshot_date <= _current_date THEN cost_received ELSE 0 END) AS received_cost_30d,
-- Averages for stock levels - only include dates within the specified period
AVG(CASE WHEN snapshot_date >= _current_date - INTERVAL '29 days' AND snapshot_date <= _current_date THEN eod_stock_quantity END) AS avg_stock_units_30d,
AVG(CASE WHEN snapshot_date >= _current_date - INTERVAL '29 days' AND snapshot_date <= _current_date THEN eod_stock_cost END) AS avg_stock_cost_30d,
-- Retail/gross stock averages stay on activity snapshots: stock_snapshots
-- has no eod_stock_retail equivalent (cost-only source table).
AVG(CASE WHEN snapshot_date >= _current_date - INTERVAL '29 days' AND snapshot_date <= _current_date THEN eod_stock_retail END) AS avg_stock_retail_30d,
AVG(CASE WHEN snapshot_date >= _current_date - INTERVAL '29 days' AND snapshot_date <= _current_date THEN eod_stock_gross END) AS avg_stock_gross_30d,
@@ -240,16 +243,89 @@ BEGIN
LEFT JOIN public.settings_vendor sv ON p.vendor = sv.vendor
),
LifetimeRevenue AS (
-- Calculate actual revenue from orders table
-- Calculate actual revenue from orders table. Negative-quantity rows
-- (returns) are included so lifetime revenue nets out returns;
-- price * quantity is already signed.
SELECT
o.pid,
SUM(o.price * o.quantity - COALESCE(o.discount, 0)) AS lifetime_revenue_from_orders,
SUM(o.quantity) AS lifetime_units_from_orders
FROM public.orders o
WHERE o.status NOT IN ('canceled', 'returned')
AND o.quantity > 0
WHERE o.status NOT IN ('canceled', 'returned', 'combined')
GROUP BY o.pid
),
-- Full-coverage stock presence from stock_snapshots (MySQL snap_product_value).
-- That source only writes rows for products WITH stock on hand, so a product
-- missing from a day the cron ran was out of stock that day. Days before the
-- product was created are not counted against it.
StockCoverage AS (
SELECT
pid,
eligible_days_30d,
days_in_stock_30d,
CASE WHEN eligible_days_30d > 0
THEN GREATEST(0, eligible_days_30d - days_in_stock_30d)
END AS stockout_days_30d,
-- Absent days count as zero stock (the old activity-only average was
-- biased toward in-stock days)
CASE WHEN eligible_days_30d > 0
THEN sum_qty::numeric / eligible_days_30d
END AS avg_stock_units_30d,
CASE WHEN eligible_days_30d > 0
THEN sum_value::numeric / eligible_days_30d
END AS avg_stock_cost_30d
FROM (
SELECT
p.pid,
LEAST(
cal.covered_days,
CASE WHEN p.created_at IS NULL THEN cal.covered_days
ELSE GREATEST(0, (_current_date - GREATEST(p.created_at::date, _current_date - 29) + 1))
END
) AS eligible_days_30d,
COALESCE(pres.days_in_stock, 0) AS days_in_stock_30d,
COALESCE(pres.sum_qty, 0) AS sum_qty,
COALESCE(pres.sum_value, 0) AS sum_value
FROM public.products p
CROSS JOIN (
SELECT COUNT(DISTINCT snapshot_date) AS covered_days
FROM public.stock_snapshots
WHERE snapshot_date >= _current_date - INTERVAL '29 days'
AND snapshot_date <= _current_date
) cal
LEFT JOIN (
SELECT pid,
COUNT(*) AS days_in_stock,
SUM(stock_quantity) AS sum_qty,
SUM(stock_value) AS sum_value
FROM public.stock_snapshots
WHERE snapshot_date >= _current_date - INTERVAL '29 days'
AND snapshot_date <= _current_date
GROUP BY pid
) pres ON pres.pid = p.pid
) base
),
-- Sales that happened on out-of-stock days (per the stock snapshot), for
-- lost-sales incidents and the fill-rate heuristic. Restricted to days the
-- stock cron actually ran so e.g. today's sales aren't misread as stockouts.
SalesDayStock AS (
SELECT
dps.pid,
SUM(dps.units_sold) AS units_sold_covered,
COUNT(*) FILTER (WHERE dps.units_sold > 0 AND ss.pid IS NULL) AS lost_sales_incidents_30d,
SUM(CASE WHEN ss.pid IS NULL THEN dps.units_sold ELSE 0 END) AS units_sold_on_stockout_days
FROM public.daily_product_snapshots dps
JOIN (
SELECT DISTINCT snapshot_date FROM public.stock_snapshots
WHERE snapshot_date >= _current_date - INTERVAL '29 days'
AND snapshot_date <= _current_date
) cal ON cal.snapshot_date = dps.snapshot_date
LEFT JOIN public.stock_snapshots ss
ON ss.pid = dps.pid AND ss.snapshot_date = dps.snapshot_date
WHERE dps.snapshot_date >= _current_date - INTERVAL '29 days'
AND dps.snapshot_date <= _current_date
GROUP BY dps.pid
),
PreviousPeriodMetrics AS (
-- Calculate metrics for previous 30-day period for growth comparison
SELECT
@@ -302,24 +378,43 @@ BEGIN
GROUP BY pid
),
ServiceLevels AS (
-- Calculate service level and fill rate metrics
-- Service level and fill rate built on full-coverage stock data
-- (StockCoverage / SalesDayStock) instead of activity-only snapshots.
SELECT
pid,
COUNT(*) FILTER (WHERE stockout_flag = true) AS stockout_incidents_30d,
COUNT(*) FILTER (WHERE stockout_flag = true AND units_sold > 0) AS lost_sales_incidents_30d,
-- Service level: percentage of days without stockouts
(1.0 - (COUNT(*) FILTER (WHERE stockout_flag = true)::NUMERIC / NULLIF(COUNT(*), 0))) * 100 AS service_level_30d,
-- Fill rate: units sold / (units sold + potential lost sales)
CASE
WHEN SUM(units_sold) > 0 THEN
(SUM(units_sold)::NUMERIC /
(SUM(units_sold) + SUM(CASE WHEN stockout_flag THEN units_sold * 0.2 ELSE 0 END))) * 100
sc.pid,
sc.stockout_days_30d AS stockout_incidents_30d,
sds.lost_sales_incidents_30d,
-- Service level: percentage of covered days the product was in stock
CASE WHEN sc.eligible_days_30d > 0 THEN
(1.0 - (sc.stockout_days_30d::NUMERIC / sc.eligible_days_30d)) * 100
END AS service_level_30d,
-- Fill rate: units sold / (units sold + potential lost sales).
-- The 0.2 lost-sales factor is an arbitrary heuristic: each unit sold on
-- an out-of-stock day is assumed to represent 20% additional missed demand.
CASE
WHEN COALESCE(sds.units_sold_covered, 0) > 0 THEN
(sds.units_sold_covered::NUMERIC /
(sds.units_sold_covered + COALESCE(sds.units_sold_on_stockout_days, 0) * 0.2)) * 100
ELSE NULL
END AS fill_rate_30d
FROM public.daily_product_snapshots
WHERE snapshot_date >= _current_date - INTERVAL '29 days'
AND snapshot_date <= _current_date
GROUP BY pid
FROM StockCoverage sc
LEFT JOIN SalesDayStock sds ON sds.pid = sc.pid
),
ProductVelocity AS (
-- Single source for sales velocity so every replenishment/cover column stays
-- consistent. NULL when the product is excluded from forecasting: excluded
-- products now still get a product_metrics row (they used to be filtered out
-- entirely and vanished from brand/vendor/category rollups), but their
-- forecast-derived columns go NULL / zero.
SELECT
ci.pid,
CASE WHEN COALESCE(s.exclude_forecast, FALSE) THEN NULL
ELSE calculate_sales_velocity(sa.sales_30d::int, COALESCE(sc.stockout_days_30d, 0)::int)
END AS daily
FROM CurrentInfo ci
LEFT JOIN SnapshotAggregates sa ON ci.pid = sa.pid
LEFT JOIN StockCoverage sc ON ci.pid = sc.pid
LEFT JOIN Settings s ON ci.pid = s.pid
),
SeasonalityAnalysis AS (
-- Set-based seasonality detection (replaces per-product function calls)
@@ -424,8 +519,8 @@ BEGIN
END AS age_days,
sa.sales_7d, sa.revenue_7d, sa.sales_14d, sa.revenue_14d, sa.sales_30d, sa.revenue_30d, sa.cogs_30d, sa.profit_30d,
sa.returns_units_30d, sa.returns_revenue_30d, sa.discounts_30d, sa.gross_revenue_30d, sa.gross_regular_revenue_30d,
sa.stockout_days_30d, sa.sales_365d, sa.revenue_365d,
sa.avg_stock_units_30d, sa.avg_stock_cost_30d, sa.avg_stock_retail_30d, sa.avg_stock_gross_30d,
sc.stockout_days_30d, sa.sales_365d, sa.revenue_365d,
sc.avg_stock_units_30d, sc.avg_stock_cost_30d, sa.avg_stock_retail_30d, sa.avg_stock_gross_30d,
sa.received_qty_30d, sa.received_cost_30d,
-- Use total_sold from products table as the source of truth for lifetime sales
-- This includes all historical data from the production database
@@ -463,66 +558,68 @@ BEGIN
sa.sales_30d AS avg_sales_per_month_30d, -- Using 30d sales as proxy for month
(sa.profit_30d / NULLIF(sa.revenue_30d, 0)) * 100 AS margin_30d,
(sa.profit_30d / NULLIF(sa.cogs_30d, 0)) * 100 AS markup_30d,
sa.profit_30d / NULLIF(sa.avg_stock_cost_30d, 0) AS gmroi_30d,
sa.sales_30d / NULLIF(sa.avg_stock_units_30d, 0) AS stockturn_30d,
(sa.returns_units_30d / NULLIF(sa.sales_30d + sa.returns_units_30d, 0)) * 100 AS return_rate_30d,
-- Annualized GMROI (30-day profit extrapolated to a year: × 365/30).
-- Conventional benchmark for healthy retail is ≥ 2-3 on this scale.
(sa.profit_30d / NULLIF(sc.avg_stock_cost_30d, 0)) * 12.17 AS gmroi_30d,
sa.sales_30d / NULLIF(sc.avg_stock_units_30d, 0) AS stockturn_30d,
-- Industry-standard definition: returns / sales (not returns / (sales+returns))
(sa.returns_units_30d / NULLIF(sa.sales_30d, 0)) * 100 AS return_rate_30d,
(sa.discounts_30d / NULLIF(sa.gross_revenue_30d, 0)) * 100 AS discount_rate_30d,
(sa.stockout_days_30d / 30.0) * 100 AS stockout_rate_30d,
(sc.stockout_days_30d::numeric / NULLIF(sc.eligible_days_30d, 0)) * 100 AS stockout_rate_30d,
sa.gross_regular_revenue_30d - sa.gross_revenue_30d AS markdown_30d,
((sa.gross_regular_revenue_30d - sa.gross_revenue_30d) / NULLIF(sa.gross_regular_revenue_30d, 0)) * 100 AS markdown_rate_30d,
-- Sell-through rate: Industry standard is Units Sold / (Beginning Inventory + Units Received)
-- Uses actual snapshot from 30 days ago as beginning stock, falls back to avg_stock_units_30d
(sa.sales_30d / NULLIF(
COALESCE(bs.beginning_stock_30d, sa.avg_stock_units_30d::int, 0) + sa.received_qty_30d,
COALESCE(bs.beginning_stock_30d, sc.avg_stock_units_30d::int, 0) + sa.received_qty_30d,
0
)) * 100 AS sell_through_30d,
-- Forecasting intermediate values
-- Use the calculate_sales_velocity function instead of repetitive calculation
calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) AS sales_velocity_daily,
-- Forecasting intermediate values (ProductVelocity; NULL when excluded from forecast)
vel.daily AS sales_velocity_daily,
s.effective_lead_time AS config_lead_time,
s.effective_days_of_stock AS config_days_of_stock,
s.effective_safety_stock AS config_safety_stock,
(s.effective_lead_time + s.effective_days_of_stock) AS planning_period_days,
calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time AS lead_time_forecast_units,
vel.daily * s.effective_lead_time AS lead_time_forecast_units,
calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_days_of_stock AS days_of_stock_forecast_units,
vel.daily * s.effective_days_of_stock AS days_of_stock_forecast_units,
calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * (s.effective_lead_time + s.effective_days_of_stock) AS planning_period_forecast_units,
vel.daily * (s.effective_lead_time + s.effective_days_of_stock) AS planning_period_forecast_units,
(ci.current_stock + COALESCE(ooi.on_order_qty, 0) - (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time)) AS lead_time_closing_stock,
(ci.current_stock + COALESCE(ooi.on_order_qty, 0) - (vel.daily * s.effective_lead_time)) AS lead_time_closing_stock,
((ci.current_stock + COALESCE(ooi.on_order_qty, 0) - (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time))) - (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_days_of_stock) AS days_of_stock_closing_stock,
((ci.current_stock + COALESCE(ooi.on_order_qty, 0) - (vel.daily * s.effective_lead_time))) - (vel.daily * s.effective_days_of_stock) AS days_of_stock_closing_stock,
((calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time) + (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_days_of_stock)) + s.effective_safety_stock - ci.current_stock - COALESCE(ooi.on_order_qty, 0) AS replenishment_needed_raw,
((vel.daily * s.effective_lead_time) + (vel.daily * s.effective_days_of_stock)) + s.effective_safety_stock - ci.current_stock - COALESCE(ooi.on_order_qty, 0) AS replenishment_needed_raw,
-- Final Forecasting / Replenishment Metrics
CEILING(GREATEST(0, (((calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time) + (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_days_of_stock)) + s.effective_safety_stock - ci.current_stock - COALESCE(ooi.on_order_qty, 0))))::int AS replenishment_units,
(CEILING(GREATEST(0, (((calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time) + (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_days_of_stock)) + s.effective_safety_stock - ci.current_stock - COALESCE(ooi.on_order_qty, 0))))::int) * ci.current_effective_cost AS replenishment_cost,
(CEILING(GREATEST(0, (((calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time) + (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_days_of_stock)) + s.effective_safety_stock - ci.current_stock - COALESCE(ooi.on_order_qty, 0))))::int) * ci.current_price AS replenishment_retail,
(CEILING(GREATEST(0, (((calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time) + (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_days_of_stock)) + s.effective_safety_stock - ci.current_stock - COALESCE(ooi.on_order_qty, 0))))::int) * (ci.current_price - ci.current_effective_cost) AS replenishment_profit,
CEILING(GREATEST(0, (((vel.daily * s.effective_lead_time) + (vel.daily * s.effective_days_of_stock)) + s.effective_safety_stock - ci.current_stock - COALESCE(ooi.on_order_qty, 0))))::int AS replenishment_units,
(CEILING(GREATEST(0, (((vel.daily * s.effective_lead_time) + (vel.daily * s.effective_days_of_stock)) + s.effective_safety_stock - ci.current_stock - COALESCE(ooi.on_order_qty, 0))))::int) * ci.current_effective_cost AS replenishment_cost,
(CEILING(GREATEST(0, (((vel.daily * s.effective_lead_time) + (vel.daily * s.effective_days_of_stock)) + s.effective_safety_stock - ci.current_stock - COALESCE(ooi.on_order_qty, 0))))::int) * ci.current_price AS replenishment_retail,
(CEILING(GREATEST(0, (((vel.daily * s.effective_lead_time) + (vel.daily * s.effective_days_of_stock)) + s.effective_safety_stock - ci.current_stock - COALESCE(ooi.on_order_qty, 0))))::int) * (ci.current_price - ci.current_effective_cost) AS replenishment_profit,
-- To Order (Apply MOQ/UOM logic here if needed, otherwise equals replenishment)
CEILING(GREATEST(0, (((calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time) + (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_days_of_stock)) + s.effective_safety_stock - ci.current_stock - COALESCE(ooi.on_order_qty, 0))))::int AS to_order_units,
CEILING(GREATEST(0, (((vel.daily * s.effective_lead_time) + (vel.daily * s.effective_days_of_stock)) + s.effective_safety_stock - ci.current_stock - COALESCE(ooi.on_order_qty, 0))))::int AS to_order_units,
GREATEST(0, - (ci.current_stock + COALESCE(ooi.on_order_qty, 0) - (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time))) AS forecast_lost_sales_units,
GREATEST(0, - (ci.current_stock + COALESCE(ooi.on_order_qty, 0) - (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time))) * ci.current_price AS forecast_lost_revenue,
GREATEST(0, - (ci.current_stock + COALESCE(ooi.on_order_qty, 0) - (vel.daily * s.effective_lead_time))) AS forecast_lost_sales_units,
GREATEST(0, - (ci.current_stock + COALESCE(ooi.on_order_qty, 0) - (vel.daily * s.effective_lead_time))) * ci.current_price AS forecast_lost_revenue,
ci.current_stock / NULLIF(calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int), 0) AS stock_cover_in_days,
COALESCE(ooi.on_order_qty, 0) / NULLIF(calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int), 0) AS po_cover_in_days,
(ci.current_stock + COALESCE(ooi.on_order_qty, 0)) / NULLIF(calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int), 0) AS sells_out_in_days,
ci.current_stock / NULLIF(vel.daily, 0) AS stock_cover_in_days,
COALESCE(ooi.on_order_qty, 0) / NULLIF(vel.daily, 0) AS po_cover_in_days,
(ci.current_stock + COALESCE(ooi.on_order_qty, 0)) / NULLIF(vel.daily, 0) AS sells_out_in_days,
-- Replenish Date: Date when stock is projected to hit safety stock, minus lead time
CASE
WHEN calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) > 0
THEN _current_date + FLOOR(GREATEST(0, ci.current_stock - s.effective_safety_stock) / calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int))::int - s.effective_lead_time
WHEN vel.daily > 0
THEN _current_date + FLOOR(GREATEST(0, ci.current_stock - s.effective_safety_stock) / vel.daily)::int - s.effective_lead_time
ELSE NULL
END AS replenish_date,
GREATEST(0, ci.current_stock - s.effective_safety_stock - ((calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time) + (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_days_of_stock)))::int AS overstocked_units,
(GREATEST(0, ci.current_stock - s.effective_safety_stock - ((calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time) + (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_days_of_stock)))) * ci.current_effective_cost AS overstocked_cost,
(GREATEST(0, ci.current_stock - s.effective_safety_stock - ((calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time) + (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_days_of_stock)))) * ci.current_price AS overstocked_retail,
GREATEST(0, ci.current_stock - s.effective_safety_stock - ((vel.daily * s.effective_lead_time) + (vel.daily * s.effective_days_of_stock)))::int AS overstocked_units,
(GREATEST(0, ci.current_stock - s.effective_safety_stock - ((vel.daily * s.effective_lead_time) + (vel.daily * s.effective_days_of_stock)))) * ci.current_effective_cost AS overstocked_cost,
(GREATEST(0, ci.current_stock - s.effective_safety_stock - ((vel.daily * s.effective_lead_time) + (vel.daily * s.effective_days_of_stock)))) * ci.current_price AS overstocked_retail,
-- Old Stock Flag
(ci.created_at::date < _current_date - INTERVAL '60 day') AND
@@ -542,18 +639,18 @@ BEGIN
ELSE
CASE
-- Check for overstock first
WHEN GREATEST(0, ci.current_stock - s.effective_safety_stock - ((calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time) + (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_days_of_stock))) > 0 THEN 'Overstock'
WHEN GREATEST(0, ci.current_stock - s.effective_safety_stock - ((vel.daily * s.effective_lead_time) + (vel.daily * s.effective_days_of_stock))) > 0 THEN 'Overstock'
-- Check for Critical stock
WHEN ci.current_stock <= 0 OR
(ci.current_stock / NULLIF(calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int), 0)) <= 0 THEN 'Critical'
(ci.current_stock / NULLIF(vel.daily, 0)) <= 0 THEN 'Critical'
WHEN (ci.current_stock / NULLIF(calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int), 0)) < (COALESCE(s.effective_lead_time, 30) * 0.5) THEN 'Critical'
WHEN (ci.current_stock / NULLIF(vel.daily, 0)) < (COALESCE(s.effective_lead_time, 30) * 0.5) THEN 'Critical'
-- Check for reorder soon
WHEN ((ci.current_stock + COALESCE(ooi.on_order_qty, 0)) / NULLIF(calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int), 0)) < (COALESCE(s.effective_lead_time, 30) + 7) THEN
WHEN ((ci.current_stock + COALESCE(ooi.on_order_qty, 0)) / NULLIF(vel.daily, 0)) < (COALESCE(s.effective_lead_time, 30) + 7) THEN
CASE
WHEN (ci.current_stock / NULLIF(calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int), 0)) < (COALESCE(s.effective_lead_time, 30) * 0.5) THEN 'Critical'
WHEN (ci.current_stock / NULLIF(vel.daily, 0)) < (COALESCE(s.effective_lead_time, 30) * 0.5) THEN 'Critical'
ELSE 'Reorder Soon'
END
@@ -574,7 +671,7 @@ BEGIN
END) > 180 THEN 'At Risk'
-- Very high stock cover is at risk too
WHEN (ci.current_stock / NULLIF(calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int), 0)) > 365 THEN 'At Risk'
WHEN (ci.current_stock / NULLIF(vel.daily, 0)) > 365 THEN 'At Risk'
-- New products (less than 30 days old)
WHEN (CASE
@@ -624,7 +721,11 @@ BEGIN
LEFT JOIN ServiceLevels sl ON ci.pid = sl.pid
LEFT JOIN BeginningStock bs ON ci.pid = bs.pid
LEFT JOIN SeasonalityAnalysis season ON ci.pid = season.pid
WHERE s.exclude_forecast IS FALSE OR s.exclude_forecast IS NULL -- Exclude products explicitly marked
LEFT JOIN StockCoverage sc ON ci.pid = sc.pid
LEFT JOIN ProductVelocity vel ON ci.pid = vel.pid
-- NOTE: products with exclude_from_forecast still get a metrics row (so they
-- appear in brand/vendor/category rollups); only their forecast-derived
-- columns are NULLed via ProductVelocity.
ON CONFLICT (pid) DO UPDATE SET
last_calculated = EXCLUDED.last_calculated,
+1 -1
View File
@@ -463,7 +463,7 @@ router.get('/efficiency', async (req, res) => {
SUM(revenue_30d) AS revenue_30d,
CASE
WHEN SUM(avg_stock_cost_30d) > 0
THEN (SUM(profit_30d) / SUM(avg_stock_cost_30d)) * 12
THEN (SUM(profit_30d) / SUM(avg_stock_cost_30d)) * 12.17
ELSE 0
END AS gmroi
FROM product_metrics
+46 -11
View File
@@ -357,6 +357,9 @@ router.get('/forecast/metrics', async (req, res) => {
const active = parseInt(totals.active_products) || 1;
const curveProducts = parseInt(totals.curve_products) || 0;
// NOTE: despite the name, this is "share of active products forecast via
// lifecycle curves" (curve coverage), NOT a statistical confidence. It only
// feeds a per-day tooltip field. See FORECAST_FIX_PLAN F9 (point 4).
const confidenceLevel = parseFloat((curveProducts / active).toFixed(2));
// Daily series from actual forecast
@@ -687,14 +690,29 @@ router.get('/forecast/accuracy', async (req, res) => {
const { rows: metrics } = await executeQuery(`
SELECT metric_type, dimension_value, sample_size,
total_actual_units, total_forecast_units,
mae, wmape, bias, rmse
mae, wmape, bias, rmse, naive_wmape, fva
FROM forecast_accuracy
WHERE run_id = $1
ORDER BY metric_type, dimension_value
`, [latestRunId]);
// Shared shaping for an "overall"-style aggregate row (daily or weekly grain).
const shapeOverall = (m) => m ? {
sampleSize: parseInt(m.sample_size),
totalActual: parseFloat(m.total_actual_units) || 0,
totalForecast: parseFloat(m.total_forecast_units) || 0,
mae: m.mae != null ? parseFloat(parseFloat(m.mae).toFixed(4)) : null,
wmape: m.wmape != null ? parseFloat((parseFloat(m.wmape) * 100).toFixed(1)) : null,
bias: m.bias != null ? parseFloat(parseFloat(m.bias).toFixed(4)) : null,
rmse: m.rmse != null ? parseFloat(parseFloat(m.rmse).toFixed(4)) : null,
naiveWmape: m.naive_wmape != null ? parseFloat((parseFloat(m.naive_wmape) * 100).toFixed(1)) : null,
fva: m.fva != null ? parseFloat(parseFloat(m.fva).toFixed(3)) : null,
} : null;
// Organize into response structure
const overall = metrics.find(m => m.metric_type === 'overall');
const overall = metrics.find(m => m.metric_type === 'overall' && m.dimension_value === 'all')
const overallInclDormant = metrics.find(m => m.metric_type === 'overall' && m.dimension_value === 'all_incl_dormant')
const overallWeekly = metrics.find(m => m.metric_type === 'overall_weekly');
const byPhase = metrics
.filter(m => m.metric_type === 'by_phase')
.map(m => ({
@@ -706,6 +724,8 @@ router.get('/forecast/accuracy', async (req, res) => {
wmape: m.wmape != null ? parseFloat((parseFloat(m.wmape) * 100).toFixed(1)) : null,
bias: m.bias != null ? parseFloat(parseFloat(m.bias).toFixed(4)) : null,
rmse: m.rmse != null ? parseFloat(parseFloat(m.rmse).toFixed(4)) : null,
naiveWmape: m.naive_wmape != null ? parseFloat((parseFloat(m.naive_wmape) * 100).toFixed(1)) : null,
fva: m.fva != null ? parseFloat(parseFloat(m.fva).toFixed(3)) : null,
}))
.sort((a, b) => (b.totalActual || 0) - (a.totalActual || 0));
@@ -763,6 +783,26 @@ router.get('/forecast/accuracy', async (req, res) => {
sampleSize: parseInt(r.sample_size),
}));
// Weekly-grain trend across runs (starts empty for old runs that predate
// the overall_weekly metric — that's expected, no backfill). F9.
const { rows: weeklyTrendRows } = await executeQuery(`
SELECT fr.finished_at::date AS run_date,
fa.wmape, fa.naive_wmape, fa.fva, fa.sample_size
FROM forecast_accuracy fa
JOIN forecast_runs fr ON fr.id = fa.run_id
WHERE fa.metric_type = 'overall_weekly'
AND fa.dimension_value = 'all'
ORDER BY fr.finished_at
`);
const accuracyTrendWeekly = weeklyTrendRows.map(r => ({
date: r.run_date instanceof Date ? r.run_date.toISOString().split('T')[0] : r.run_date,
wmape: r.wmape != null ? parseFloat((parseFloat(r.wmape) * 100).toFixed(1)) : null,
naiveWmape: r.naive_wmape != null ? parseFloat((parseFloat(r.naive_wmape) * 100).toFixed(1)) : null,
fva: r.fva != null ? parseFloat(parseFloat(r.fva).toFixed(3)) : null,
sampleSize: parseInt(r.sample_size),
}));
res.json({
hasData: true,
computedAt,
@@ -775,20 +815,15 @@ router.get('/forecast/accuracy', async (req, res) => {
? historyInfo.latest_date.toISOString().split('T')[0]
: historyInfo.latest_date,
},
overall: overall ? {
sampleSize: parseInt(overall.sample_size),
totalActual: parseFloat(overall.total_actual_units) || 0,
totalForecast: parseFloat(overall.total_forecast_units) || 0,
mae: overall.mae != null ? parseFloat(parseFloat(overall.mae).toFixed(4)) : null,
wmape: overall.wmape != null ? parseFloat((parseFloat(overall.wmape) * 100).toFixed(1)) : null,
bias: overall.bias != null ? parseFloat(parseFloat(overall.bias).toFixed(4)) : null,
rmse: overall.rmse != null ? parseFloat(parseFloat(overall.rmse).toFixed(4)) : null,
} : null,
overall: shapeOverall(overall),
overallInclDormant: shapeOverall(overallInclDormant),
overallWeekly: shapeOverall(overallWeekly),
byPhase,
byLeadTime,
byMethod,
dailyTrend,
accuracyTrend,
accuracyTrendWeekly,
});
} catch (err) {
console.error('Error fetching forecast accuracy:', err);
@@ -2,7 +2,7 @@ import { useQuery } from "@tanstack/react-query"
import { apiFetch } from '@/utils/api';
import { BarChart, Bar, ResponsiveContainer, XAxis, YAxis, Tooltip as RechartsTooltip, Cell, LineChart, Line } from "recharts"
import config from "@/config"
import { Target, TrendingDown, ArrowUpDown } from "lucide-react"
import { Target, TrendingDown, ArrowUpDown, Swords } from "lucide-react"
import { Tooltip as UITooltip, TooltipContent, TooltipProvider, TooltipTrigger } from "@/components/ui/tooltip"
import { PHASE_CONFIG } from "@/utils/lifecyclePhases"
@@ -14,6 +14,8 @@ interface OverallMetrics {
wmape: number | null
bias: number | null
rmse: number | null
naiveWmape?: number | null
fva?: number | null
}
interface PhaseAccuracy {
@@ -25,6 +27,8 @@ interface PhaseAccuracy {
wmape: number | null
bias: number | null
rmse: number | null
naiveWmape?: number | null
fva?: number | null
}
interface LeadTimeAccuracy {
@@ -51,11 +55,14 @@ interface AccuracyData {
daysOfHistory?: number
historyRange?: { from: string; to: string }
overall?: OverallMetrics
overallInclDormant?: OverallMetrics
overallWeekly?: OverallMetrics
byPhase?: PhaseAccuracy[]
byLeadTime?: LeadTimeAccuracy[]
byMethod?: { method: string; sampleSize: number; mae: number | null; wmape: number | null; bias: number | null }[]
dailyTrend?: { date: string; mae: number | null; wmape: number | null; bias: number | null }[]
accuracyTrend?: AccuracyTrendPoint[]
accuracyTrendWeekly?: { date: string; wmape: number | null; naiveWmape: number | null; fva: number | null; sampleSize: number }[]
}
function MetricSkeleton() {
@@ -74,12 +81,30 @@ function formatBias(bias: number | null): string {
}
function getAccuracyColor(wmape: number | null): string {
// Daily-grain thresholds (used for the by-phase / lead-time bars).
if (wmape === null) return "text-muted-foreground"
if (wmape <= 30) return "text-green-600"
if (wmape <= 50) return "text-yellow-600"
return "text-red-600"
}
function getWeeklyAccuracyColor(wmape: number | null): string {
// Weekly per-product grain has a much lower achievable floor than daily grain
// on this intermittent-demand catalog, so the headline uses its own thresholds.
if (wmape === null) return "text-muted-foreground"
if (wmape <= 60) return "text-green-600"
if (wmape <= 90) return "text-yellow-600"
return "text-red-600"
}
function formatSignedPct(ratio: number | null, digits = 0): string {
// ratio is a fraction (0.7 => +70%); null-safe.
if (ratio === null || ratio === undefined) return "N/A"
const pct = ratio * 100
const sign = pct > 0 ? "+" : ""
return `${sign}${pct.toFixed(digits)}%`
}
export function ForecastAccuracy() {
const { data, error, isLoading } = useQuery<AccuracyData>({
queryKey: ["forecast-accuracy"],
@@ -133,6 +158,24 @@ export function ForecastAccuracy() {
sampleSize: lt.sampleSize,
}))
// Headline prefers the weekly-grain WMAPE (informative); falls back to the
// daily-grain number until enough complete weeks of history exist.
const weeklyWmape = data?.overallWeekly?.wmape ?? null
const usingWeekly = weeklyWmape !== null
const headlineWmape = usingWeekly ? weeklyWmape : (data?.overall?.wmape ?? null)
const headlineColor = usingWeekly
? getWeeklyAccuracyColor(headlineWmape)
: getAccuracyColor(headlineWmape)
// Net forecast-vs-actual ratio (e.g. +70% = over-forecasting), from the
// daily 'all' totals — far more legible than bias in raw units.
const totalFc = data?.overall?.totalForecast ?? 0
const totalAct = data?.overall?.totalActual ?? 0
const fcVsAct = totalAct > 0 ? (totalFc / totalAct - 1) : null
// Value over the naive baseline; prefer weekly grain to match the headline.
const naiveSource = data?.overallWeekly ?? data?.overall
const naiveWmape = naiveSource?.naiveWmape ?? null
const fva = naiveSource?.fva ?? null
return (
<div>
<h3 className="text-lg font-medium mb-3">Forecast Accuracy</h3>
@@ -148,10 +191,24 @@ export function ForecastAccuracy() {
<div className="flex items-baseline justify-between">
<div className="flex items-center gap-2">
<Target className="h-4 w-4 text-muted-foreground" />
<p className="text-sm font-medium text-muted-foreground">WMAPE</p>
<p className="text-sm font-medium text-muted-foreground">
WMAPE <span className="text-[10px] opacity-70">({usingWeekly ? "weekly" : "daily"})</span>
</p>
</div>
<p className={`text-lg font-bold ${getAccuracyColor(data?.overall?.wmape ?? null)}`}>
{formatWmape(data?.overall?.wmape ?? null)}
<p className={`text-lg font-bold ${headlineColor}`}>
{formatWmape(headlineWmape)}
</p>
</div>
<div className="flex items-baseline justify-between">
<div className="flex items-center gap-2">
<ArrowUpDown className="h-4 w-4 text-muted-foreground" />
<p className="text-sm font-medium text-muted-foreground">Forecast vs actual</p>
</div>
<p className="text-lg font-bold">
{formatSignedPct(fcVsAct)}
<span className="text-xs font-normal text-muted-foreground ml-1">
{(fcVsAct ?? 0) > 0 ? "over" : (fcVsAct ?? 0) < 0 ? "under" : ""}
</span>
</p>
</div>
<div className="flex items-baseline justify-between">
@@ -160,20 +217,24 @@ export function ForecastAccuracy() {
<p className="text-sm font-medium text-muted-foreground">MAE</p>
</div>
<p className="text-lg font-bold">
{data?.overall?.mae !== null ? data?.overall?.mae?.toFixed(2) : "N/A"}
{data?.overall?.mae != null ? data?.overall?.mae?.toFixed(2) : "N/A"}
<span className="text-xs font-normal text-muted-foreground ml-1">units</span>
</p>
</div>
<div className="flex items-baseline justify-between">
<div className="flex items-center gap-2">
<ArrowUpDown className="h-4 w-4 text-muted-foreground" />
<p className="text-sm font-medium text-muted-foreground">Bias</p>
<Swords className="h-4 w-4 text-muted-foreground" />
<p className="text-sm font-medium text-muted-foreground">vs naive</p>
</div>
<p className="text-lg font-bold">
{formatBias(data?.overall?.bias ?? null)}
<span className="text-xs font-normal text-muted-foreground ml-1">
{(data?.overall?.bias ?? 0) > 0 ? "over" : (data?.overall?.bias ?? 0) < 0 ? "under" : ""}
<span className={fva != null ? (fva > 0 ? "text-green-600" : "text-red-600") : "text-muted-foreground"}>
{fva != null ? `${formatSignedPct(fva)} FVA` : "N/A"}
</span>
{naiveWmape != null && (
<span className="text-xs font-normal text-muted-foreground ml-1">
naive {formatWmape(naiveWmape)}
</span>
)}
</p>
</div>
</div>