Compare commits
2 Commits
9ff744399f
...
069a44bd54
| Author | SHA1 | Date | |
|---|---|---|---|
| 069a44bd54 | |||
| 3b2f51e6b8 |
@@ -0,0 +1,343 @@
|
||||
# Forecast Accuracy Fix Plan
|
||||
|
||||
**Written:** 2026-06-10, from a code + live-data review of the forecasting pipeline.
|
||||
**Goal:** eliminate the systematic ~1.7–2x over-forecast bias, recover demand the model currently ignores, and fix the accuracy measurement so improvements are visible and long-lead forecasts are validated.
|
||||
|
||||
Read this whole document before starting. Fixes are grouped into phases; each phase is independently deployable and has its own validation step. Line numbers are as of 2026-06-10 — re-locate by function name if the file has drifted.
|
||||
|
||||
---
|
||||
|
||||
## 1. Diagnosis summary (measured 2026-06-10)
|
||||
|
||||
The dashboard headline is **202% WMAPE**. Decomposition of that number, all measured against `forecast_accuracy` run 129 and ad-hoc queries:
|
||||
|
||||
| Finding | Evidence |
|
||||
|---|---|
|
||||
| Daily-grain WMAPE has a ~190% *floor* for this catalog | Avg demand ≈ 0.11 units/product/day. A perfect rate forecast of intermittent demand scores ≈ 2e^−λ ≈ 190%. A trivial trailing-30d-average naive forecast scores **204%** on the same products/days; the engine scores 221% (slightly *worse than naive*). |
|
||||
| Same forecasts at 21-day-per-product grain: **109%**; bias-corrected: **75%** | Half the headline is metric grain, most of the rest is bias. |
|
||||
| Aggregate over-forecast **+70%** (227,690 forecast vs 133,861 actual units) | Portfolio daily ratio is 1.5–2.5x on most days. |
|
||||
| Decay phase 2.47x over (fc 51,675 / act 20,915) | Root cause F1: velocity inflated **4.07x** (measured: 1.353 vs true 0.332 units/day) by averaging over sparse snapshot rows. |
|
||||
| Preorder phase 2.15x over (fc 67,212 / act 31,189) | Root cause F4: launch curve applied at age=0 starting *today*, ignoring that the product hasn't arrived. |
|
||||
| Mature phase 1.69x over (fc 57,857 / act 34,313) | Root causes F2 (history edge truncation) + F3 (seasonal double-count). |
|
||||
| Dormant products sold **16,180 units** (~11% of demand) against zero forecasts | Root cause F5; also excluded from the headline metric, so invisible. |
|
||||
| All 879,800 accuracy samples are in the **1–7d lead bucket** | Root cause F7: archiving design only ever saves yesterday's slice. 30–90d forecasts (what purchasing uses) are never validated. |
|
||||
| Launch phase is healthy: WMAPE 100%, bias −6%, beats naive | The lifecycle-curve concept works; its calibration inputs are broken. Don't redesign it. |
|
||||
|
||||
**Key data fact** underlying several fixes: `daily_product_snapshots` is **activity-based and sparse** — only ~500–1,800 of ~38K products have a row on a given day. Verified: every pid-day with an order DOES have a snapshot row and units match (5,234/5,234 pid-days, 8,980 vs 8,984 units over 7 days). So *missing row = zero sales*, and any query that aggregates over only the rows that exist is averaging over sold-days.
|
||||
|
||||
---
|
||||
|
||||
## 2. Environment & operational notes
|
||||
|
||||
- **Files:** engine is `inventory-server/scripts/forecast/forecast_engine.py`; orchestrator `run_forecast.js` in the same dir; consumer endpoints in `inventory-server/src/routes/dashboard.js` (`/forecast/metrics` ~line 308, `/forecast/accuracy` ~line 647); overview UI in `inventory/src/components/overview/ForecastMetrics.tsx` and `ForecastAccuracy.tsx`.
|
||||
- **Local `inventory-server/` is NFS-mounted to `/var/www/inventory/` on the netcup server.** Edits made locally appear on the server immediately — no copy step. Do NOT run bulk `grep`/`find`/`node --check` over `inventory-server/` locally (the mount hangs); `ssh netcup` and run them there.
|
||||
- **Avoid the glob tool** for search in this repo; use bash (`grep`/`rg` via ssh for server-side trees).
|
||||
- **Scheduling:** the engine runs daily at **09:30:01 server time** (runs table is conclusive), but the cron entry is NOT in matt's crontab, `/etc/cron.d`, or pm2. Likely root's crontab (`sudo crontab -l` to confirm). You do not need to touch the schedule for these fixes; just know a run fires at 09:30 daily and occasionally skips days (e.g. 2026-06-07/08).
|
||||
- **Manual test runs:** `ssh netcup`, then `cd /var/www/inventory/scripts/forecast && node run_forecast.js`. Takes ~3.5–4 min. Safe to run any time: the engine TRUNCATEs and rebuilds `product_forecasts`, archives prior past-dated rows, and records a new `forecast_runs` row. Python deps live in the server venv (`venv/`); `run_forecast.js` handles env + venv automatically.
|
||||
- **DB access for validation:** `ssh netcup`, then `PGPASSWORD=6D3GUkxuFgi2UghwgnUd psql -h localhost -U inventory_readonly -d inventory_db`. The engine itself connects with the write user via env vars loaded from `/var/www/inventory/.env` — schema changes should be made idempotently *inside the engine code* (the file already uses `CREATE TABLE IF NOT EXISTS` / `CREATE INDEX IF NOT EXISTS`; use `ALTER TABLE ... ADD COLUMN IF NOT EXISTS` the same way) so no manual migration is needed.
|
||||
- **Python gotchas already handled in this file (don't regress):** numpy types must go through the registered psycopg2 adapters; `pd.Series.combine_first()` keeps zeros over real data — use `reindex(..., fill_value=0.0)`.
|
||||
- Engine runtime budget: currently ~212–227s. Phases 1–2 shouldn't move it meaningfully; Phase 3's extra archiving adds one INSERT…SELECT. If runtime balloons past ~6 min, investigate before shipping.
|
||||
- `--backfill` mode (`backfill_accuracy_data`) is an in-sample backtest using the *old* formulas. **Do not run it anymore**; there is enough real out-of-sample history. Updating it to match the new logic is optional/low priority (F11).
|
||||
|
||||
---
|
||||
|
||||
## Phase 1 — Bias bugs in the engine (no schema changes)
|
||||
|
||||
### F1. Decay velocity: stop averaging over sparse snapshot rows
|
||||
|
||||
**Where:** `forecast_engine.py`, `batch_load_product_data()`, the decay query (~lines 697–710).
|
||||
|
||||
**Problem:** `AVG(COALESCE(dps.units_sold, 0))` runs over only the snapshot rows that exist — mostly sold-days. Measured inflation on the current 975 decay products: **4.07x** (1.353 vs 0.332 true units/day). This feeds `compute_scale_factor()` for the decay phase and is the single largest bias source.
|
||||
|
||||
**Fix:** divide the sum by calendar days in the window, clipped to the product's age (decay products are 14–60 days old, so a 20-day-old product's window is 20 days, not 30):
|
||||
|
||||
```sql
|
||||
SELECT dps.pid,
|
||||
SUM(COALESCE(dps.units_sold, 0))::float
|
||||
/ GREATEST(LEAST(30, (CURRENT_DATE - pm.date_first_received::date)), 1) AS avg_daily
|
||||
FROM daily_product_snapshots dps
|
||||
JOIN product_metrics pm ON pm.pid = dps.pid
|
||||
WHERE dps.pid = ANY(%s)
|
||||
AND dps.snapshot_date >= CURRENT_DATE - INTERVAL '30 days'
|
||||
AND dps.snapshot_date >= pm.date_first_received::date
|
||||
GROUP BY dps.pid, pm.date_first_received
|
||||
```
|
||||
|
||||
No Python-side changes needed; `data['decay_velocity']` keeps the same shape. Products with zero snapshot rows in the window still get no entry → existing `scale = 1.0` fallback applies (acceptable: decay classification requires `sales_velocity_daily > 0`, so truly dead products don't reach this path).
|
||||
|
||||
### F2. Mature history: reindex over the full calendar window
|
||||
|
||||
**Where:** `forecast_engine.py`, `forecast_mature()` (~lines 833–836).
|
||||
|
||||
**Problem:** `hist.set_index('snapshot_date').resample('D').sum()` only spans first-snapshot → last-snapshot. Interior gaps correctly become zeros, but **leading and trailing quiet periods are absent**, so the Holt level is fitted on the product's busy span. A marginal mature product whose activity clusters in 2 of the last 8 weeks gets a level ~4x too high.
|
||||
|
||||
**Fix:** replace the resample with an explicit reindex over the full `EXP_SMOOTHING_WINDOW` ending yesterday:
|
||||
|
||||
```python
|
||||
hist = history_df.copy()
|
||||
hist['snapshot_date'] = pd.to_datetime(hist['snapshot_date'])
|
||||
hist = hist.set_index('snapshot_date')['units_sold']
|
||||
full_index = pd.date_range(
|
||||
end=pd.Timestamp(date.today() - timedelta(days=1)),
|
||||
periods=EXP_SMOOTHING_WINDOW, freq='D')
|
||||
series = hist.reindex(full_index, fill_value=0.0).values.astype(float)
|
||||
```
|
||||
|
||||
Notes: (pid, snapshot_date) is unique in `daily_product_snapshots`, so no duplicate-index risk. `observed_mean` and the `cap` recompute over the full window automatically (intended — the cap gets correspondingly tighter). Mature products are by definition >60 days old, so the 60-day window never predates first receipt. Do NOT use `combine_first` (see gotchas above).
|
||||
|
||||
### F3. Stop double-applying the monthly seasonal index
|
||||
|
||||
**Where:** `forecast_engine.py`, `generate_all_forecasts()` — the `seasonal_multipliers` pre-compute (~lines 959–961) and application (~line 1050).
|
||||
|
||||
**Problem:** every per-product calibration (decay velocity, mature Holt level, launch first-week scale, preorder rate, slow-mover velocity) is fitted on *raw recent actuals*, which already embed the current month's seasonal level. The forecast then multiplies by the **absolute** monthly index of the target date. Example from the live indices (`forecast_runs.phase_counts` for run 129): May = 1.224 (sale month), June = 0.982. Early-June forecasts were calibrated on May-sale-inflated velocities and barely discounted — a structural ~25% over-forecast at that transition, and it'll be worse around November (1.316).
|
||||
|
||||
**Fix:** apply the seasonal index *relative to the calibration period*. Compute a calibration index as the average monthly index over the trailing 30 calendar days (robust at month boundaries), then divide:
|
||||
|
||||
```python
|
||||
today = date.today()
|
||||
trailing = [today - timedelta(days=i) for i in range(1, 31)]
|
||||
calibration_index = float(np.mean([monthly_indices.get(d.month, 1.0) for d in trailing]))
|
||||
seasonal_multipliers = [
|
||||
monthly_indices.get(d.month, 1.0) / max(calibration_index, 0.1)
|
||||
for d in forecast_dates
|
||||
]
|
||||
```
|
||||
|
||||
Leave the DOW multipliers absolute — every calibration is a multi-week average and therefore DOW-neutral, so reshaping by absolute DOW indices is correct.
|
||||
|
||||
**Optional sub-fix (same area, low priority):** the monthly indices are computed from a single trailing 365-day window, so each month appears once and YoY growth contaminates "seasonality". A cheap improvement is widening `SEASONAL_LOOKBACK_DAYS` to 730 and averaging the two observations of each month. Do this only after the main fixes are validated.
|
||||
|
||||
### Phase 1 validation
|
||||
|
||||
Deploy (edit locally; NFS propagates), run the engine manually once, wait for 3–5 daily cycles, then:
|
||||
|
||||
```sql
|
||||
-- Portfolio ratio per day (target: drifts from ~2.0 toward 0.8–1.3)
|
||||
WITH ranked AS (
|
||||
SELECT pfh.pid, pfh.forecast_date, pfh.forecast_units, pfh.lifecycle_phase,
|
||||
ROW_NUMBER() OVER (PARTITION BY pfh.pid, pfh.forecast_date ORDER BY fr.started_at DESC) rn
|
||||
FROM product_forecasts_history pfh
|
||||
JOIN forecast_runs fr ON fr.id = pfh.run_id
|
||||
WHERE pfh.forecast_date >= CURRENT_DATE - 7)
|
||||
SELECT r.forecast_date, round(SUM(r.forecast_units),0) AS fc,
|
||||
SUM(COALESCE(dps.units_sold,0)) AS act,
|
||||
round(SUM(r.forecast_units)/NULLIF(SUM(COALESCE(dps.units_sold,0)),0),2) AS ratio
|
||||
FROM ranked r
|
||||
LEFT JOIN daily_product_snapshots dps ON dps.pid = r.pid AND dps.snapshot_date = r.forecast_date
|
||||
WHERE r.rn = 1 AND r.lifecycle_phase != 'dormant'
|
||||
GROUP BY 1 ORDER BY 1;
|
||||
```
|
||||
|
||||
Also check `forecast_accuracy` `by_phase` rows for the newest run: decay bias should fall from +0.35 toward ~0, mature from +0.17 toward ~0. (Accuracy lags ~1 day behind each fix since it evaluates yesterday's forecasts.)
|
||||
|
||||
---
|
||||
|
||||
## Phase 2 — Demand the model currently ignores or mistimes
|
||||
|
||||
### F4. Preorder: forecast the preorder rate until arrival, launch curve after
|
||||
|
||||
**Where:** `forecast_engine.py` — `batch_load_product_data()` (add arrival dates), `generate_all_forecasts()` preorder branch (~lines 1005–1009), and `forecast_from_curve()` (or a small wrapper).
|
||||
|
||||
**Problem:** preorder products run the launch curve from `age=0` starting **today**, i.e. full first-week launch sales while the product is still weeks from arriving. Actual preorder-period sales are a much slower trickle.
|
||||
|
||||
**Fix:**
|
||||
|
||||
1. Batch-load each preorder product's expected arrival from `purchase_orders` (line-item grain: it has `pid` and `expected_date` directly). Open statuses verified against live data: `created`, `ordered`, `electronically_sent`, `receiving_started` (~705 open line items currently have a future `expected_date`):
|
||||
|
||||
```sql
|
||||
SELECT pid, MIN(expected_date) AS expected_arrival
|
||||
FROM purchase_orders
|
||||
WHERE pid = ANY(%s)
|
||||
AND status IN ('created', 'ordered', 'electronically_sent', 'receiving_started')
|
||||
AND expected_date IS NOT NULL
|
||||
AND expected_date >= CURRENT_DATE
|
||||
GROUP BY pid
|
||||
```
|
||||
|
||||
Fallbacks, in order: (a) an open PO with a *past* `expected_date` → assume arrival in 7 days; (b) no PO at all → arrival in 14 days (and log a counter of how many hit this default).
|
||||
|
||||
2. In the preorder branch, build the daily array piecewise. Let `days_until_arrival = (expected_arrival - today).days`:
|
||||
- Days `0 .. days_until_arrival-1`: flat observed preorder daily rate = `preorder_sales[pid] / max(preorder_days[pid], 1)` (both already batch-loaded), clamped to ≤ the curve's scaled week-0 daily value.
|
||||
- Days `days_until_arrival .. horizon`: `forecast_from_curve(curve_info, scale, age_days=0, ...)` shifted so the curve's day 0 lands on the arrival date (i.e. pass `horizon_days - days_until_arrival` and offset into the output array).
|
||||
- Keep the existing `compute_scale_factor('preorder', ...)` for the post-arrival curve; the pre-arrival segment doesn't use it.
|
||||
|
||||
This is consistent with how the reference curves were built: historical preorder units were recorded on their **order dates** (pre-arrival), so week-0 of the fitted curves reflects post-receipt orders, not the backlog.
|
||||
|
||||
### F5. Dormant products: small positive rate instead of hard zero, and count them
|
||||
|
||||
**Where:** `forecast_engine.py` — `generate_all_forecasts()` dormant branch (~lines 1040–1042), `batch_load_product_data()`, and `compute_accuracy()`.
|
||||
|
||||
**Problem:** all ~28K dormant products are forecast at exactly 0, yet they sold 16,180 units in the eval window (~11% of all demand) — restocks, promos, long-tail. Worse, dormant is *excluded* from the headline accuracy filter, so this miss is invisible.
|
||||
|
||||
**Fix (cheap version, do this now):**
|
||||
|
||||
1. Batch-load a trailing-180-day order rate for dormant products (11,362 of them have ≥1 sale in 180d — verified):
|
||||
|
||||
```sql
|
||||
SELECT o.pid, SUM(o.quantity) / 180.0 AS rate
|
||||
FROM orders o
|
||||
WHERE o.pid = ANY(%s)
|
||||
AND o.canceled IS DISTINCT FROM TRUE
|
||||
AND o.date >= CURRENT_DATE - INTERVAL '180 days'
|
||||
GROUP BY o.pid
|
||||
```
|
||||
|
||||
2. Dormant branch: if the product has a rate > 0, forecast it flat with `method = 'velocity'`; else keep zeros with `method = 'zero'`. Apply the same DOW/seasonal multipliers as everything else (automatic — they're applied after the branch).
|
||||
3. In `compute_accuracy()`, add a second overall row: `metric_type='overall', dimension_value='all_incl_dormant'` with no dormant filter (keep the existing `'all'` row unchanged for trend continuity). One extra entry in the `dimensions`/`filter_clauses` dicts.
|
||||
|
||||
**Upgrade path (optional, Phase 4):** replace flat rates for `slow_mover` + dormant-with-sales with TSB (Teunter–Syntetos–Babai), the standard intermittent-demand method with obsolescence handling. Per product over a daily series `d_t` (build it from snapshots the F2 way — full calendar reindex):
|
||||
|
||||
```
|
||||
if d_t > 0: p_t = p_{t-1} + β·(1 − p_{t-1}); z_t = z_{t-1} + α·(d_t − z_{t-1})
|
||||
else: p_t = p_{t-1}·(1 − β); z_t = z_{t-1}
|
||||
forecast = p_T · z_T (flat across horizon)
|
||||
```
|
||||
|
||||
Start with α=0.1, β=0.05, initialize p = (nonzero days / total days), z = mean of nonzero demands. Scope: slow_mover (~6K) + dormant with 180d sales (~11K); series from up to 180 days of snapshots (sparse rows → ~manageable volume). Only do this after Phase 3 measurement exists to prove it beats the flat rates.
|
||||
|
||||
### Phase 2 validation
|
||||
|
||||
After 3–5 cycles: preorder `by_phase` bias should drop from +0.85 toward < +0.3; the new `all_incl_dormant` row should appear and its `total_actual_units` minus `'all'`'s should be largely *covered* rather than all-miss (dormant `bias` rising from −1.36 toward ~−0.3 or better).
|
||||
|
||||
---
|
||||
|
||||
## Phase 3 — Fix the measurement (schema + engine + API + UI)
|
||||
|
||||
> Without this phase you cannot see whether Phases 1–2 worked except by ad-hoc SQL, the lead-time chart stays a single bucket forever, and the dashboard keeps displaying a number with a 190% floor in red.
|
||||
|
||||
### F7. Archive long-lead forecasts so 15/30/60/90d accuracy exists
|
||||
|
||||
**Where:** `forecast_engine.py` — `archive_forecasts()` (~lines 1086–1154), `compute_accuracy()` CTE (~lines 1201–1228).
|
||||
|
||||
**Problem:** the current design archives only *past-dated* rows of the previous run before truncation. With daily runs, that's only ever the 1-day-ahead slice — all 879,800 accuracy samples sit in the '1-7d' bucket and the longer buckets in the UI chart can never populate. Purchasing decisions ride on 30–60d forecasts that are never validated.
|
||||
|
||||
**Fix:**
|
||||
|
||||
1. Keep the existing past-date archiving exactly as is (it provides dense short-lead coverage).
|
||||
2. After `generate_all_forecasts()` completes, additionally archive a **sampled set of future leads** from the new run, non-dormant only, attributed to the *current* run id (correct attribution, unlike the past-date path which attributes to the previous run):
|
||||
|
||||
```sql
|
||||
INSERT INTO product_forecasts_history
|
||||
(run_id, pid, forecast_date, forecast_units, forecast_revenue,
|
||||
lifecycle_phase, forecast_method, confidence_lower, confidence_upper, generated_at)
|
||||
SELECT %(run_id)s, pid, forecast_date, forecast_units, forecast_revenue,
|
||||
lifecycle_phase, forecast_method, confidence_lower, confidence_upper, generated_at
|
||||
FROM product_forecasts
|
||||
WHERE lifecycle_phase != 'dormant'
|
||||
AND forecast_date - CURRENT_DATE IN (7, 14, 30, 60, 89)
|
||||
ON CONFLICT (run_id, pid, forecast_date) DO NOTHING
|
||||
```
|
||||
|
||||
Volume: ~10K non-dormant products × 5 leads ≈ 50K rows/day; the existing 90-day prune (`forecast_date < CURRENT_DATE - 90`) bounds steady state at a few million rows. Note future-dated rows survive until their date passes + 90 days — that's intended.
|
||||
|
||||
3. **CRITICAL companion change** in `compute_accuracy()`: the accuracy CTE must now exclude not-yet-realized rows, or future-dated archives get scored against actual=0:
|
||||
|
||||
```sql
|
||||
FROM product_forecasts_history pfh
|
||||
JOIN forecast_runs fr ON fr.id = pfh.run_id
|
||||
WHERE pfh.forecast_date < CURRENT_DATE -- ADD THIS
|
||||
```
|
||||
|
||||
4. **Dedup semantics change.** Today's `ROW_NUMBER() OVER (PARTITION BY pid, forecast_date ORDER BY started_at DESC)` keeps only the latest (= shortest-lead) row per pid/date, which would silently discard all the new long-lead rows. Restructure:
|
||||
- Compute `lead_days = forecast_date - started_at::date` and the lead bucket *inside* `ranked_history`.
|
||||
- For `by_lead_time`: dedup `PARTITION BY pid, forecast_date, lead_bucket` (one sample per pid/date/bucket, latest run wins within a bucket).
|
||||
- For everything else (`overall`, `by_phase`, `by_method`, `daily`, and the new weekly metric below): restrict to `lead_days BETWEEN 0 AND 6` and keep the existing per-(pid, date) dedup. This preserves the current meaning of the headline metrics (short-lead) while the lead-time table becomes real.
|
||||
|
||||
### F8. Track a naive baseline (forecast value-added)
|
||||
|
||||
**Where:** `archive_forecasts()` (both INSERT paths), `compute_accuracy()`, `forecast_accuracy` schema, `/forecast/accuracy` endpoint.
|
||||
|
||||
**Problem:** the engine currently *loses* to a trailing-average naive forecast (221% vs 204% daily WMAPE) and nothing on the dashboard would ever reveal that. Every accuracy improvement should be judged as value-over-naive.
|
||||
|
||||
**Fix:**
|
||||
|
||||
1. Schema (idempotent, in the ensure blocks): `ALTER TABLE product_forecasts_history ADD COLUMN IF NOT EXISTS naive_units NUMERIC(10,2);` and `ALTER TABLE forecast_accuracy ADD COLUMN IF NOT EXISTS naive_wmape NUMERIC(10,4), ADD COLUMN IF NOT EXISTS fva NUMERIC(10,4);`
|
||||
2. Populate `naive_units` during both archive INSERTs via a join — naive = flat trailing-28-day average daily units as of archive time (28 days = DOW-balanced; information available at generation; same value at every lead, which is exactly what a naive baseline means):
|
||||
|
||||
```sql
|
||||
LEFT JOIN (
|
||||
SELECT o.pid, SUM(o.quantity) / 28.0 AS naive_daily
|
||||
FROM orders o
|
||||
WHERE o.canceled IS DISTINCT FROM TRUE
|
||||
AND o.date >= CURRENT_DATE - INTERVAL '28 days' AND o.date < CURRENT_DATE
|
||||
GROUP BY o.pid
|
||||
) nv ON nv.pid = pf.pid
|
||||
-- select COALESCE(nv.naive_daily, 0) AS naive_units
|
||||
```
|
||||
|
||||
3. In `compute_accuracy()`, add to each dimension's aggregate: `SUM(ABS(naive_units - actual_units)) / NULLIF(SUM(actual_units),0) AS naive_wmape` and store `fva = 1 - wmape / naive_wmape` (NULL-safe). Rows archived before this change have `naive_units` NULL — treat NULL as excluded (`FILTER (WHERE naive_units IS NOT NULL)` on the naive sums) rather than as zero.
|
||||
4. Endpoint: include `naiveWmape` and `fva` in the `overall` (and per-phase) payload of `/dashboard/forecast/accuracy` in `dashboard.js`.
|
||||
|
||||
### F9. Weekly-grain headline metric + bias as a percentage
|
||||
|
||||
**Where:** `compute_accuracy()`, `/forecast/accuracy` endpoint, `ForecastAccuracy.tsx`.
|
||||
|
||||
**Problem:** daily-grain WMAPE on this catalog has a ~190% floor — as a headline it's noise. The informative numbers are (a) weekly-per-product WMAPE (currently ~109%, target ~70–85% post-fix) and (b) aggregate bias, which the UI currently renders as `+0.108 units` — indistinguishable from zero while the reality is +70%.
|
||||
|
||||
**Fix:**
|
||||
|
||||
1. New metric in `compute_accuracy()`: `metric_type='overall_weekly', dimension_value='all'`. Definition: using the short-lead deduped rows (lead ≤ 6, non-dormant), aggregate per `(pid, date_trunc('week', forecast_date))` keeping only complete weeks (`COUNT(*) = 7`), then `WMAPE = SUM(ABS(fc_week − act_week)) / SUM(act_week)`, excluding pid-weeks where both are 0. Store sample_size = number of pid-weeks. Compute `naive_wmape`/`fva` the same way from `naive_units`.
|
||||
2. Endpoint: expose as `overallWeekly`; also add a weekly variant to the `accuracyTrend` query (`metric_type='overall_weekly'`). The trend will start empty (old runs lack the row) — that's fine; don't backfill.
|
||||
3. `ForecastAccuracy.tsx`:
|
||||
- Headline WMAPE → `overallWeekly.wmape`, labeled "WMAPE (weekly)". Keep daily WMAPE available in a tooltip if desired.
|
||||
- Color thresholds for weekly grain: green ≤ 60, yellow ≤ 90, red above (tunable; document that they're calibrated for intermittent retail demand).
|
||||
- Replace the bias row: show `(totalForecast / totalActual − 1)` as a signed percentage labeled "Forecast vs actual" (both totals already arrive in `overall`). Keep MAE.
|
||||
- Add a "vs naive" line: naive weekly WMAPE and FVA. FVA > 0 = engine adds value.
|
||||
- The lead-time chart needs no code change — buckets will populate as F7 rows mature (7d lead evaluable after 7 days, 30d after 30, etc.).
|
||||
4. `confidenceLevel` in `/forecast/metrics` ([dashboard.js ~line 360]) is "share of products forecast via lifecycle curves", not confidence. It only feeds a per-day tooltip field — rename the JSON field to `curveCoverage` and update the one consumer in `ForecastMetrics.tsx`, or leave it and add a comment; low priority.
|
||||
|
||||
### Phase 3 validation
|
||||
|
||||
- Next run after deploy: `forecast_accuracy` contains `overall_weekly` and `fva` values; `/dashboard/forecast/accuracy` returns them; the overview popover renders weekly WMAPE, bias %, and the naive comparison.
|
||||
- After 7/14/30 days: `by_lead_time` rows appear for '8-14d', '15-30d', '31-60d' buckets respectively (61-90d after ~60 days).
|
||||
- Confirm engine runtime still < ~5 min and `product_forecasts_history` growth ≈ 50–70K rows/day.
|
||||
|
||||
---
|
||||
|
||||
## Phase 4 — Optional / after the above is proven
|
||||
|
||||
- **F6. TSB for slow movers + dormant** (spec in F5). Gate on Phase 3 measurement: ship only if weekly FVA improves on those phases.
|
||||
- **F10. Confidence-margin source:** `load_accuracy_margins()` feeds daily-grain per-phase WMAPE (clamped to 1.0) into the intervals, so every interval is ±100% — uninformative. Once `overall_weekly` exists, add per-phase weekly rows (`by_phase_weekly`) and source margins from those instead.
|
||||
- **F11.** Update or delete `backfill_accuracy_data()` (it encodes the old formulas). Until then, just don't run `--backfill`.
|
||||
- **F12.** `compute_dow_indices()` weights by revenue but the multipliers are applied to units — switch `SUM(o.price * o.quantity)` to `SUM(o.quantity)`. Tiny effect.
|
||||
- **F13.** Longer term: for reorder decisions the right target is P(lead-time demand > stock), not a point forecast. Evaluate quantile (pinball) loss at lead-time horizons using the existing confidence-interval columns. Design separately.
|
||||
|
||||
---
|
||||
|
||||
## 4. Success criteria
|
||||
|
||||
1. Rolling-14-day portfolio forecast/actual ratio within **0.8–1.25** (currently 1.5–2.5).
|
||||
2. Weekly-grain WMAPE ≤ **90%** and **FVA > 0** (engine beats naive) sustained for 2+ weeks.
|
||||
3. Decay/preorder/mature per-phase bias within ±0.1 units/day (currently +0.35 / +0.85 / +0.17).
|
||||
4. `all_incl_dormant` actuals covered: dormant bias better than −0.4 (currently −1.36, i.e. 100% miss).
|
||||
5. Lead-time buckets through 31–60d populated with ≥10K samples each within ~6 weeks.
|
||||
6. Launch phase stays healthy (bias within ±0.15, WMAPE not degraded) — regression guard for F3/F4 changes.
|
||||
|
||||
## 5. Re-measurement appendix
|
||||
|
||||
The naive-vs-engine comparison used in the diagnosis (rerun any time; adjust dates):
|
||||
|
||||
```sql
|
||||
WITH ranked AS (
|
||||
SELECT pfh.pid, pfh.forecast_date, pfh.forecast_units, pfh.lifecycle_phase,
|
||||
ROW_NUMBER() OVER (PARTITION BY pfh.pid, pfh.forecast_date ORDER BY fr.started_at DESC) rn
|
||||
FROM product_forecasts_history pfh
|
||||
JOIN forecast_runs fr ON fr.id = pfh.run_id
|
||||
WHERE pfh.forecast_date BETWEEN CURRENT_DATE - 9 AND CURRENT_DATE - 1),
|
||||
eng AS (SELECT * FROM ranked WHERE rn = 1 AND lifecycle_phase != 'dormant'),
|
||||
naive AS (
|
||||
SELECT o.pid, SUM(o.quantity)/30.0 AS naive_daily FROM orders o
|
||||
WHERE o.canceled IS DISTINCT FROM TRUE
|
||||
AND o.date >= CURRENT_DATE - 39 AND o.date < CURRENT_DATE - 9
|
||||
GROUP BY o.pid)
|
||||
SELECT e.lifecycle_phase, COUNT(*) AS n, SUM(COALESCE(dps.units_sold,0)) AS actual,
|
||||
round(SUM(e.forecast_units),0) AS engine_fc, round(SUM(COALESCE(nv.naive_daily,0)),0) AS naive_fc,
|
||||
round(SUM(ABS(e.forecast_units - COALESCE(dps.units_sold,0)))/NULLIF(SUM(COALESCE(dps.units_sold,0)),0),2) AS engine_wmape,
|
||||
round(SUM(ABS(COALESCE(nv.naive_daily,0) - COALESCE(dps.units_sold,0)))/NULLIF(SUM(COALESCE(dps.units_sold,0)),0),2) AS naive_wmape
|
||||
FROM eng e
|
||||
LEFT JOIN naive nv ON nv.pid = e.pid
|
||||
LEFT JOIN daily_product_snapshots dps ON dps.pid = e.pid AND dps.snapshot_date = e.forecast_date
|
||||
GROUP BY ROLLUP(e.lifecycle_phase) ORDER BY 1;
|
||||
```
|
||||
|
||||
Baseline numbers to beat (June 1–9, 2026): engine 221% / naive 204% daily WMAPE; engine_fc/actual = 1.82; per-phase table in §1.
|
||||
@@ -0,0 +1,449 @@
|
||||
# Import & Metrics Pipeline Fix Plan
|
||||
|
||||
Fixes for issues found in a full review (2026-06-10) of the `full-update.js` pipeline:
|
||||
`inventory-server/scripts/full-update.js` → `import-from-prod.js` (6 importers in `scripts/import/`)
|
||||
→ `calculate-metrics-new.js` (7 SQL modules in `scripts/metrics-new/`).
|
||||
|
||||
Every issue below was verified against the code, and where marked **[verified-live]**, against the
|
||||
live MySQL source (`sg` on 192.168.1.5 via the acot-db tooling / `ssh workpi`) and live PostgreSQL
|
||||
(`inventory_db` — `ssh netcup`, then `psql -U inventory_readonly`, password in `/Users/matt/Dev/inventory/CLAUDE.md`).
|
||||
Write credentials for migrations: see `/var/www/inventory/.env` on netcup (`inventory_user`).
|
||||
|
||||
## Operational context (read first)
|
||||
|
||||
- Local `inventory-server/` is **NFS-mounted** to `/var/www/inventory/` on the netcup server — edits
|
||||
appear on the server with no copy step. Run heavy validation/grep/find **on the server via
|
||||
`ssh netcup`**, not locally (NFS hangs + AppleDouble `._*` noise).
|
||||
- The PG server timezone is **Europe/Berlin**. The business operates in **America/Chicago**. This
|
||||
matters for Fix 2.
|
||||
- MySQL server is America/Chicago; the mysql2 driver is configured `timezone: '-05:00'` and
|
||||
corrected at runtime by `adjustDateForMySQL()` in `scripts/import/utils.js` (see
|
||||
`memory/TIMEZONE_ISSUE.md`). Don't "fix" that part — it already works.
|
||||
- Orders/PO/products imports are incremental by default (`INCREMENTAL_UPDATE !== 'false'`); a full
|
||||
orders sync = run with `INCREMENTAL_UPDATE=false` (5-year window).
|
||||
- Existing rebuild tooling: `scripts/metrics-new/backfill/rebuild_daily_snapshots.sql` (rebuilds
|
||||
`daily_product_snapshots` from `orders`/`receivings`). The full-pipeline order after data fixes:
|
||||
re-import → rebuild snapshots → `node scripts/calculate-metrics-new.js`.
|
||||
- Precedent: `scripts/metrics-new/migrations/002_fix_discount_double_counting.sql` documents the
|
||||
procedure used last time a discount formula changed. Follow the same pattern (migration doc +
|
||||
code fix + full re-import + rebuild).
|
||||
|
||||
---
|
||||
|
||||
## P0 — Data correctness (do both, then ONE re-import + rebuild)
|
||||
|
||||
### Fix 1: Item-level promo discounts dropped (~$26K / 30 days ≈ 10% of product revenue) [verified-live]
|
||||
|
||||
**File:** `scripts/import/orders.js` — `order_totals` CTE (~lines 604-623) and the discount fetch in
|
||||
`processDiscountsBatch` (~lines 379-383).
|
||||
|
||||
**Problem.** The discount applied to each PG `orders` row is:
|
||||
prorated `summary_discount_subtotal` + item-level promo discounts. The item-level part is gated:
|
||||
|
||||
```sql
|
||||
SUM(CASE WHEN COALESCE(md.discount_amount_subtotal, 0) > 0 THEN id.amount ELSE 0 END)
|
||||
```
|
||||
|
||||
In the PHP source (`/Users/matt/Dev/acot/website/website/lib/neworder.class.php`):
|
||||
- `order_items.prod_price` is the **pre-promo** price; `summary_subtotal = Σ prod_price·qty` (line ~3087).
|
||||
- Item-level promo discounts live in `order_discount_items` with `which = 2`; they are applied to the
|
||||
order total via `summary_discount += amount + products_disc_sum` (line ~6567) — i.e. they are **not**
|
||||
part of `discount_amount_subtotal` and **not** baked into `prod_price`.
|
||||
- Live data (90 days): of 10,010 type-10 promo discounts, **8,070 have item rows but only 8 have
|
||||
`discount_amount_subtotal > 0`** — the gate zeroes essentially all item-level promo discounts.
|
||||
- Live impact (30 days): **$25,989 dropped** across 2,021 orders, vs only $13,574 captured via the
|
||||
prorated subtotal component. Order discount components, 30d: total $54,957 = $13,574 subtotal +
|
||||
$15,395 shipping + ~$25,989 item-level. (Shipping discounts correctly excluded from product revenue.)
|
||||
|
||||
**Consequence.** `orders.discount` understated → `net_revenue`, `profit_30d`, `margin_30d` overstated
|
||||
by ~10% of revenue; `discounts_30d` / `discount_rate_30d` ~3x understated. Flows into daily snapshots,
|
||||
product/brand/vendor/category metrics, and dashboards.
|
||||
|
||||
**Fix.**
|
||||
1. In `processDiscountsBatch`, fetch only real item discounts:
|
||||
`SELECT order_id, pid, discount_id, amount FROM order_discount_items WHERE order_id IN (?) AND which = 2`.
|
||||
(`which=1` rows store prices of free promo-added items; `which=3` are usage records — neither is a
|
||||
discount amount.)
|
||||
2. In the `order_totals` CTE, remove the gate — sum `id.amount` unconditionally:
|
||||
`SUM(COALESCE(id.amount, 0)) AS promo_discount_sum` (drop the join/CASE on `temp_main_discounts`;
|
||||
`temp_main_discounts` becomes unused and can be removed entirely along with its insert loop).
|
||||
3. Sanity guard (optional, recommended): clamp final per-row discount to `price * quantity`.
|
||||
|
||||
**Verification.** After a FULL orders re-import, for a recent 30-day window PG should satisfy:
|
||||
`SUM(discount)` ≈ MySQL `Σ summary_discount_subtotal` + `Σ order_discount_items.amount (which=2)`
|
||||
over the same orders (± rounding from proration). Spot-check an order with a type-10 promo:
|
||||
discount on the affected pid ≈ the `which=2` amount. Re-run migration 002's verification query too
|
||||
(pids 624756, 614513) to confirm no regression of the prior fix.
|
||||
|
||||
### Fix 2: Daily snapshots bucket sales by Europe/Berlin days, not business days [verified-live]
|
||||
|
||||
**Files:** `scripts/metrics-new/update_daily_snapshots.sql` (SalesData join `o.date::date = _target_date`
|
||||
~line 138; gap-fill and stale-detection aggregates at lines ~47-83);
|
||||
`scripts/metrics-new/backfill/rebuild_daily_snapshots.sql` (same pattern — check & fix);
|
||||
`scripts/metrics-new/update_product_metrics.sql` (`HistoricalDates` `MIN(o.date)::date` etc., lines ~131-147).
|
||||
|
||||
**Problem.** `orders.date` is `timestamptz`; `::date` casts in the server TZ (**Europe/Berlin**,
|
||||
verified via `SHOW timezone`). Berlin is 7-8h ahead of Central, so every order placed after
|
||||
~5 PM Central lands on the **next** snapshot day. This shifts a large evening slice of daily sales
|
||||
forward one day; skews `yesterday_sales`, day-of-week patterns (the forecast engine's DOW
|
||||
multipliers, daily-grain forecast accuracy — see `FORECAST_FIX_PLAN.md`), and is inconsistent with
|
||||
`stock_snapshots`, whose dates come from a Central-time MySQL cron.
|
||||
|
||||
**Fix.** Bucket all order/receiving dates in business time. Replace every `o.date::date` /
|
||||
`received_date::date` used for *day bucketing* in the two snapshot SQL files with:
|
||||
|
||||
```sql
|
||||
(o.date AT TIME ZONE 'America/Chicago')::date
|
||||
```
|
||||
|
||||
Apply consistently in: SalesData, ReceivingData, the gap-fill date lists, the stale-detection
|
||||
aggregates (they must match SalesData or every day looks permanently stale), and the rebuild script.
|
||||
`HistoricalDates` in update_product_metrics (first/last sold dates) should match too.
|
||||
Add an index to keep the per-day loop fast, e.g.
|
||||
`CREATE INDEX ON orders ( ((date AT TIME ZONE 'America/Chicago')::date) );` and equivalent on
|
||||
`receivings(received_date)`; check `EXPLAIN` on the SalesData query afterward.
|
||||
|
||||
Note: `receivings.received_date` came from MySQL DATETIME (Central literal) inserted as timestamptz —
|
||||
it was interpreted in the *session* TZ at insert. Before converting, spot-check a few receivings
|
||||
against MySQL to confirm which TZ the stored instants actually represent; the conversion expression
|
||||
must yield the Central calendar day MySQL shows. Same check for `orders.date` (it originates from
|
||||
`_order.date_placed`, a TIMESTAMP column, so it should be a correct instant — `AT TIME ZONE
|
||||
'America/Chicago'` is right for it).
|
||||
|
||||
**Verification.** Pick 2-3 recent days; compare per-day `units_sold` totals in
|
||||
`daily_product_snapshots` against MySQL
|
||||
`SELECT date_placed_onlydate, SUM(qty_ordered) ... WHERE order_status >= 20 GROUP BY 1`
|
||||
(MySQL stores Central days). They should now match closely (small diffs from canceled-status timing).
|
||||
|
||||
### P0 execution order (single pass)
|
||||
|
||||
1. Land Fix 1 (orders.js) and Fix 2 (both snapshot SQL files + product-metrics date CTE).
|
||||
2. Full orders re-import: `INCREMENTAL_UPDATE=false node scripts/import-from-prod.js` (or at minimum
|
||||
the orders step) — run on the server, it's long.
|
||||
3. Rebuild snapshots: `psql -f scripts/metrics-new/backfill/rebuild_daily_snapshots.sql` (after
|
||||
confirming it contains the TZ fix). The hourly job's 90-day self-heal will NOT fix history beyond
|
||||
90 days by itself; the explicit rebuild is required.
|
||||
4. `node scripts/calculate-metrics-new.js`.
|
||||
5. Expect dashboards to show: margins down ~8-10 points (real), daily sales curves shifted, DOW
|
||||
profile changed. Tell the user before/after numbers.
|
||||
|
||||
---
|
||||
|
||||
## P1 — Wrong or drifting numbers, fix soon
|
||||
|
||||
### Fix 3: Vendor avg lead time computed over a near-cartesian join
|
||||
|
||||
**File:** `scripts/metrics-new/calculate_vendor_metrics.sql`, `VendorPOAggregates` (lines ~62-83).
|
||||
|
||||
**Problem.** Joins each done-PO line to **every** receiving of the same (pid, supplier) after the PO
|
||||
date — a product received 10 times contributes 10 ever-growing lead times → overstated, busy-product-
|
||||
weighted vendor lead time. The per-product version in `update_periodic_metrics.sql` (lines 27-48)
|
||||
is correct (MIN receiving per PO within 180 days, then average).
|
||||
|
||||
**Fix.** Reuse the periodic shape, aggregated to vendor:
|
||||
|
||||
```sql
|
||||
WITH po_first_receiving AS (
|
||||
SELECT po.vendor, po.po_id, po.pid, po.date::date AS po_date,
|
||||
MIN(r.received_date::date) AS first_receive_date
|
||||
FROM purchase_orders po
|
||||
JOIN receivings r ON r.pid = po.pid AND r.supplier_id = po.supplier_id
|
||||
AND r.received_date >= po.date
|
||||
AND r.received_date <= po.date + INTERVAL '180 days'
|
||||
WHERE po.status = 'done' AND po.date >= CURRENT_DATE - INTERVAL '1 year'
|
||||
AND po.vendor IS NOT NULL AND po.vendor <> ''
|
||||
GROUP BY po.vendor, po.po_id, po.pid, po.date
|
||||
)
|
||||
SELECT vendor, COUNT(DISTINCT po_id) AS po_count_365d,
|
||||
ROUND(AVG(GREATEST(1, first_receive_date - po_date)))::int AS avg_lead_time_days_hist
|
||||
FROM po_first_receiving GROUP BY vendor
|
||||
```
|
||||
|
||||
**Verification.** For a few vendors compare old vs new values; new should be materially lower and
|
||||
roughly match `AVG(product_metrics.avg_lead_time_days)` for that vendor's products.
|
||||
|
||||
### Fix 4: Deleted order items & combined orders never reconciled in PG [verified-live]
|
||||
|
||||
**File:** `scripts/import/orders.js`.
|
||||
|
||||
**Problem.** The orders import upserts but never deletes:
|
||||
- Items removed from an order in MySQL (`DELETE FROM order_items ...` happens, e.g.
|
||||
neworder.class.php ~line 6500 for unpicked promo items, plus staff edits) leave stale rows in PG
|
||||
forever. May 2026 check: PG has 49,841 item rows vs MySQL 49,377 (+0.9%) — and PG should be ≤
|
||||
MySQL.
|
||||
- Combining orders (`combine_orders`, neworder.class.php ~11946) sets the source orders to status 16
|
||||
AND **zeroes `date_placed`**, then copies all items to a NEW order. Because the import query
|
||||
filters `o.date_placed >= …`, a combined source order can never be re-fetched, so its stale
|
||||
'placed' rows would double-count with the new merged order. Currently latent (last combine
|
||||
2024-07, predating current PG data — verified no stale rows exist today), but it will silently
|
||||
corrupt the day combining is used again.
|
||||
|
||||
**Fix.** Two parts, both inside the orders import after the upsert phase:
|
||||
1. **Item-set reconciliation** for re-imported orders: the import already knows the set of changed
|
||||
`orderIds` and inserted their current items into `temp_order_items`. Mirror the PO import's
|
||||
pattern (`purchase-orders.js` lines ~683-694):
|
||||
```sql
|
||||
DELETE FROM orders o
|
||||
WHERE o.order_number = ANY($1) -- orders fetched this run
|
||||
AND NOT EXISTS (SELECT 1 FROM temp_order_items t
|
||||
WHERE t.order_id = o.order_number AND t.pid = o.pid);
|
||||
```
|
||||
2. **Combined/cancelled sweep** that does NOT depend on `date_placed`: each run, fetch from MySQL
|
||||
`SELECT order_id, order_status FROM _order WHERE order_status IN (15,16) AND stamp > ?`
|
||||
(no date_placed filter) and update matching PG rows' `status`/`canceled`
|
||||
('combined' rows are then excluded from metrics — see Fix 5). Cheap (small result set).
|
||||
|
||||
**Verification.** Re-run the May-2026 row-count comparison (MySQL vs PG for one month) after one full
|
||||
run; counts should converge (PG ≤ MySQL, diff explained by TZ window edges only).
|
||||
|
||||
### Fix 5: 'combined' orders are counted as sales
|
||||
|
||||
**Files:** `scripts/metrics-new/update_daily_snapshots.sql` (status filters, lines ~77, 120-134),
|
||||
`update_product_metrics.sql` (`HistoricalDates` line ~145, `LifetimeRevenue` line ~249),
|
||||
`backfill/rebuild_daily_snapshots.sql`.
|
||||
|
||||
**Problem.** Sales filters exclude only `('canceled', 'returned')`. Status 16 'combined' = "merged
|
||||
into another order" — the new order carries the same items, so counting both double-counts. 826
|
||||
combined orders exist in MySQL; today none are in PG (see Fix 4), but once Fix 4's sweep starts
|
||||
marking rows 'combined', the metrics filters must exclude them.
|
||||
|
||||
**Fix.** Change every `NOT IN ('canceled', 'returned')` in the metrics SQL to
|
||||
`NOT IN ('canceled', 'returned', 'combined')`. Grep for the pattern in `scripts/metrics-new/` and
|
||||
`src/routes/` (dashboard endpoints replicate these filters — see CLAUDE.md analytics-filters note).
|
||||
|
||||
### Fix 6: Incremental sync watermark race (silent permanent misses)
|
||||
|
||||
**Files:** `scripts/import/orders.js` (~772), `products.js` (~934), `purchase-orders.js` (~833).
|
||||
|
||||
**Problem.** `sync_status.last_sync_timestamp` is set to `NOW()` *after* the import finishes. Any
|
||||
MySQL row modified between the source query and that write is below the new watermark but was never
|
||||
fetched → permanently skipped (until a full sync or the row changes again). Long imports widen the
|
||||
window; PG/MySQL clock skew adds to it.
|
||||
|
||||
**Fix.** Capture the watermark **before** the source query and write that value:
|
||||
```js
|
||||
const [[{ now: sourceNow }]] = await prodConnection.query('SELECT NOW() as now');
|
||||
// ... do the import ...
|
||||
await localConnection.query(
|
||||
`INSERT INTO sync_status ... VALUES ('orders', $1) ON CONFLICT ... SET last_sync_timestamp = $1`,
|
||||
[sourceNow]);
|
||||
```
|
||||
Using MySQL's own clock also eliminates cross-server skew. Note `sourceNow` comes back through the
|
||||
mysql2 driver TZ conversion — verify round-tripping with `adjustDateForMySQL` produces a correct
|
||||
comparison value, or store `UTC_TIMESTAMP()` and compare against `CONVERT_TZ`-normalized stamps.
|
||||
Overlap (re-importing rows changed during the run) is harmless — everything is upserted.
|
||||
|
||||
### Fix 7: Stockout days / service level / fill rate / avg stock built on activity-only snapshots
|
||||
|
||||
**Files:** `scripts/metrics-new/update_product_metrics.sql` — `SnapshotAggregates`
|
||||
(`stockout_days_30d`, `avg_stock_*_30d`, lines ~177-189), `ServiceLevels` (lines ~304-323),
|
||||
plus `calculate_sales_velocity` usage.
|
||||
|
||||
**Problem.** `daily_product_snapshots` only has rows on days with sales/receivings. So:
|
||||
- A product that is out of stock (and therefore sells nothing) gets **no row** → `stockout_days_30d`
|
||||
≈ 0 exactly when stockouts matter → `calculate_sales_velocity(sales, stockout_days)`'s adjustment
|
||||
is inert → velocity and replenishment understated for constrained products.
|
||||
- `service_level_30d` divides stockout days by COUNT(activity days), not 30.
|
||||
- `avg_stock_units_30d` / `avg_stock_cost_30d` average only activity days (biased toward in-stock
|
||||
days) → GMROI / stockturn / sell-through denominators biased.
|
||||
- `fill_rate_30d`'s `units_sold * 0.2` lost-sales heuristic is arbitrary — fine to keep, but document.
|
||||
|
||||
**Fix.** Derive stock-presence metrics from `stock_snapshots` (full daily coverage from MySQL
|
||||
`snap_product_value`, imported by `stock-snapshots.js`) instead of `daily_product_snapshots`:
|
||||
```sql
|
||||
StockCoverage AS (
|
||||
SELECT pid,
|
||||
COUNT(*) FILTER (WHERE stock_quantity <= 0) AS stockout_days_30d,
|
||||
AVG(stock_quantity) AS avg_stock_units_30d,
|
||||
AVG(stock_value) AS avg_stock_cost_30d
|
||||
FROM stock_snapshots
|
||||
WHERE snapshot_date >= _current_date - INTERVAL '29 days'
|
||||
GROUP BY pid
|
||||
)
|
||||
```
|
||||
Treat products absent from `stock_snapshots` for a day as unknown (NULL), not in-stock. Keep
|
||||
`daily_product_snapshots` for sales/revenue aggregates. `service_level_30d` denominator becomes the
|
||||
count of covered days. Note `stock_snapshots` has no `eod_stock_retail`; keep retail/gross averages
|
||||
on the old source or compute as `stock_quantity * current price` explicitly.
|
||||
|
||||
**Verification.** Pick products that had a known stockout period; `stockout_days_30d` should now be
|
||||
> 0 and `sales_velocity_daily` should rise accordingly.
|
||||
|
||||
---
|
||||
|
||||
## P2 — Definition / robustness improvements
|
||||
|
||||
### Fix 8: Returns don't reduce COGS; LifetimeRevenue ignores returns
|
||||
`update_daily_snapshots.sql` SalesData: COGS accrues only on `quantity > 0` rows; return rows
|
||||
(negative qty — 15,875 rows live) subtract revenue but never COGS → margin understated in
|
||||
return-heavy periods. Add a returns-COGS term mirroring the sales-COGS COALESCE chain
|
||||
(`SUM(... WHEN quantity < 0 THEN cost * ABS(quantity))`) and subtract it in `cogs` (or store
|
||||
`returns_cogs` separately and use `cogs - returns_cogs` in profit). Also `LifetimeRevenue` in
|
||||
`update_product_metrics.sql` (line ~242) filters `quantity > 0` — include negative-qty rows so
|
||||
lifetime revenue nets out returns (drop the quantity filter; `price*quantity` is already signed,
|
||||
but check the `- discount` term sign for return rows).
|
||||
|
||||
### Fix 9: return_rate_30d definition
|
||||
`update_product_metrics.sql` line ~468: `returns / (sales + returns)` → industry standard is
|
||||
`returns / sales`. Change denominator to `NULLIF(sa.sales_30d, 0)`.
|
||||
|
||||
### Fix 10: GMROI not annualized
|
||||
Line ~466: `profit_30d / avg_stock_cost_30d` is a monthly GMROI (~1/12 of the conventional annual
|
||||
figure, benchmark ≥ 2-3). Either annualize (`* 12.17`) or rename the column/label "monthly".
|
||||
Decision for Matt; annualizing is recommended for comparability. Frontend displays must be checked
|
||||
either way.
|
||||
|
||||
### Fix 11: get_weighted_avg_cost is a lifetime WAC
|
||||
`db/functions.sql` (~line 81, deployed identically): averages ALL receivings ≤ date — decade-old
|
||||
costs weigh equally. Recommended: window to recent receivings, e.g. last 365 days falling back to
|
||||
lifetime when none. Used as fallback COGS when `o.costeach` is NULL, so impact is modest but real
|
||||
for long-lived SKUs. Apply with `CREATE OR REPLACE FUNCTION` in `db/functions.sql` AND on the live DB.
|
||||
|
||||
### Fix 12: exclude_from_forecast removes products from product_metrics entirely
|
||||
`update_product_metrics.sql` line ~627 (`WHERE s.exclude_forecast IS FALSE OR ... IS NULL`): the
|
||||
flag's name implies forecast-only, but excluded products get NO metrics row → vanish from brand/
|
||||
vendor/category rollups and dashboards. Fix: always emit the row; instead NULL the
|
||||
forecast/replenishment columns when excluded (wrap those expressions in
|
||||
`CASE WHEN s.exclude_forecast THEN NULL ELSE ... END`).
|
||||
|
||||
### Fix 13: Incremental products import misses category-only changes
|
||||
`products.js` incremental WHERE (~lines 433-440) keys on `p.stamp`, `ci.stamp`, price/b2b dates —
|
||||
`product_category_index` changes don't bump any of those → PG `product_categories` goes stale. Also
|
||||
the `needs_update` comparison (~lines 604-625) doesn't compare `categories`, so even refetched rows
|
||||
skip the category rewrite. Fix both: add `t.categories IS NOT DISTINCT FROM p.categories` to the
|
||||
needs_update comparison (note: `products.categories` is the GROUP_CONCAT string — confirm PG column
|
||||
holds the same representation), and add a cheap full-sweep (e.g. weekly, or compare
|
||||
`COUNT(*) GROUP BY pid` hashes) OR include `EXISTS (SELECT 1 FROM product_category_index pci WHERE
|
||||
pci.pid = p.pid AND pci.stamp > ?)` in the incremental WHERE if that table has a stamp column —
|
||||
verify schema first (`DESCRIBE product_category_index`).
|
||||
|
||||
### Fix 14: PO/receivings OFFSET pagination over a moving filter
|
||||
`purchase-orders.js` (~lines 275-298, 447-470): `LIMIT/OFFSET` with a `date_updated > ?` predicate;
|
||||
concurrent updates shift rows between pages → silent skips. Fix: keyset pagination —
|
||||
`WHERE ... AND p.po_id > ? ORDER BY p.po_id LIMIT 500`, carrying the last seen po_id (drop OFFSET).
|
||||
Same for receivings on `receiving_id`.
|
||||
|
||||
### Fix 15: Status map gaps and unsafe defaults
|
||||
- `orders.js` orderStatusMap lacks 45 (`payment_pending`) and 67 (`remote_send`) → imported as
|
||||
numeric strings. Add both (mirror in `migrations/001_map_order_statuses.sql` as a follow-up update
|
||||
for existing rows).
|
||||
- `purchase-orders.js` `poStatusMap[po.status] || 'created'` (line ~335): an unknown *cancel-like*
|
||||
code would be treated as an open PO and inflate on-order FIFO. Default to a sentinel like
|
||||
`'unknown_<code>'` instead, and make the FIFO/on-order CTEs in `update_product_metrics.sql` treat
|
||||
only the known-open statuses as open (they already whitelist open statuses — so the sentinel is
|
||||
safe there; just ensure nothing treats unknown as 'created'). Same for receivingStatusMap.
|
||||
|
||||
### Fix 16: Transactions issued through the pool wrapper land on arbitrary connections
|
||||
`categories.js` (lines ~17-152) and `daily-deals.js` (~27-130) call `query('BEGIN')` /
|
||||
`query('COMMIT')` on the wrapper, which checks out a client per call — BEGIN/work/COMMIT are not
|
||||
guaranteed to share a connection (works only by pool-LIFO accident). The categories
|
||||
`DISABLE TRIGGER` rides on this too. Fix: use the wrapper's `beginTransaction()/commit()/rollback()`
|
||||
(see `utils.js` lines 121-148) exactly as orders.js does. In categories.js also move the
|
||||
post-COMMIT `ENABLE TRIGGER` inside the transaction (DISABLE/ENABLE both inside), or drop the
|
||||
trigger toggling entirely if the trigger isn't actually problematic anymore.
|
||||
|
||||
### Fix 17: stock-snapshots import swallows batch errors → permanent holes
|
||||
`stock-snapshots.js` (~lines 153-155): a failed batch is logged and skipped, but the next
|
||||
incremental starts at `MAX(snapshot_date)` — the hole is never revisited. Fix: rethrow (fail the
|
||||
step) or collect failed date ranges and retry once, then fail if still failing. Also line ~168:
|
||||
`calculateRate(processedRows, startTime)` — arguments reversed (signature is
|
||||
`calculateRate(startTime, current)`, see `metrics-new/utils/progress.js:70`).
|
||||
|
||||
### Fix 18: Metrics cancellation targets an application_name that's never set
|
||||
`calculate-metrics-new.js` line ~180 cancels backends `WHERE application_name =
|
||||
'node-metrics-calculator'`, but the Pool config never sets it → cancellation no-ops (the 30-min
|
||||
`statement_timeout` is the only real guard). Fix: add `application_name: 'node-metrics-calculator'`
|
||||
to both dbConfig branches.
|
||||
|
||||
### Fix 19: Aggregate-table change-detection lists miss cost-only changes
|
||||
`calculate_brand_metrics.sql` / `calculate_vendor_metrics.sql` / `calculate_category_metrics.sql`
|
||||
ON CONFLICT WHERE lists don't include `profit_30d`/`cogs_30d` — a cost revision with unchanged
|
||||
sales/revenue leaves stale rows (product_metrics has a 1-day staleness net; rollups don't). Add
|
||||
`... OR x.profit_30d IS DISTINCT FROM EXCLUDED.profit_30d OR x.cogs_30d IS DISTINCT FROM
|
||||
EXCLUDED.cogs_30d` to each, or add a `last_calculated < NOW() - INTERVAL '1 day'` net like
|
||||
product_metrics line ~707.
|
||||
|
||||
### Fix 20: Snapshot stale-detection only compares unit counts
|
||||
`update_daily_snapshots.sql` lines ~57-85: detects mismatches in `units_sold`/`units_received` only;
|
||||
price/discount/costeach corrections older than the 2-day recheck are never repaired. Add a
|
||||
revenue comparison to the stale check: compare `SUM(net_revenue)` per day against the equivalent
|
||||
recomputed from `orders` (ROUND both to 2dp to avoid float-noise churn).
|
||||
|
||||
### Fix 21: Category metrics positive-only revenue asymmetry
|
||||
`calculate_category_metrics.sql` (lines ~27-36, 64-73): revenue summed only when `> 0` while
|
||||
cogs/profit use COALESCE-all → margin numerator/denominator from different populations, and
|
||||
inconsistent with brand/vendor (plain COALESCE). Change the revenue/sales CASEs to
|
||||
`COALESCE(pm.revenue_7d, 0)` etc., matching brand_metrics.
|
||||
|
||||
### Fix 22 (decision needed): Demand-pattern & seasonality definitions
|
||||
- `classify_demand_pattern` (db/functions.sql): CV thresholds 0.2/0.5 + avg<1/day. Industry standard
|
||||
is Syntetos-Boylan: ADI ≥ 1.32 and CV² ≥ 0.49 quadrants (smooth/erratic/intermittent/lumpy).
|
||||
Today everything classifies sporadic/lumpy. If adopting SB: ADI = 30 / COUNT(days with sales),
|
||||
CV² computed on nonzero-demand sizes. Changes the vocabulary consumed by the forecast engine
|
||||
(`scripts/forecast/forecast_engine.py` reads `demand_pattern`) — coordinate before changing.
|
||||
- SeasonalityAnalysis (`update_product_metrics.sql` ~360): `month_avg = AVG(units_sold)` over rows
|
||||
with sales only → intensity, not volume. Use monthly totals (SUM, with zero months counted) /
|
||||
overall monthly average for the index.
|
||||
- Safety stock: currently static config units; `sales_std_dev_30d` exists but is unused. Optional
|
||||
upgrade: `safety = z * σ_d * sqrt(lead_time)` with z from a service-level setting.
|
||||
|
||||
These change user-facing semantics — confirm with Matt before implementing.
|
||||
|
||||
---
|
||||
|
||||
## Verified non-issues (no action, or cleanup only)
|
||||
|
||||
- **`costeach` fallback `price * 0.5`** (orders.js line ~615): fires on **2.1%** of item rows
|
||||
(729/34,833, last 30d, live-verified). Accepted by Matt — 50% margin is a fair estimate for these
|
||||
products. Optional: nothing.
|
||||
- **Missing-product order skips**: zero occurrences — MySQL has no orphan order_items (1-year check),
|
||||
PG products is a superset of MySQL products (687,579 vs 687,576), last 7 import runs all logged
|
||||
`totalSkipped: 0`. Cleanup only: remove the unused `importMissingProducts` import line at
|
||||
`orders.js:2` (the function itself stays in products.js — harmless utility).
|
||||
- **Status 30 'cancelled_old'** in `total_sold >= 20` filter: zero rows live in `_order` — safe.
|
||||
- **Duplicate (order_id, pid) order items**: none exist in MySQL — the upsert PK is safe.
|
||||
- **base_discount** in orders.js: computed/stored in temp table but unused since migration 002 —
|
||||
remove the column from temp table + queries for clarity (no behavior change).
|
||||
- **`full-update.js` `runScript`**: try/catch around `console.log` is dead code; per-step
|
||||
`status:'complete'` messages could confuse a UI parser. Cosmetic only — tidy if touching the file.
|
||||
|
||||
## Suggested implementation order
|
||||
|
||||
| Step | Fixes | Re-import/rebuild needed |
|
||||
|---|---|---|
|
||||
| 1 | Fix 1 + Fix 2 (+ Fix 5 filters, Fix 8/9 while editing the same SQL) | FULL orders re-import → snapshot rebuild → metrics (once) |
|
||||
| 2 | Fix 4 + Fix 6 (orders.js reconciliation + watermarks; POs/products watermarks too) | no |
|
||||
| 3 | Fix 3, Fix 7 (metrics SQL only) | metrics run |
|
||||
| 4 | Fix 13-21 (robustness batch) | no |
|
||||
| 5 | Fix 10-12, Fix 22 after Matt's sign-off (definition changes) | metrics run |
|
||||
|
||||
After step 1, expect: margin_30d down ~8-10 points, discounts_30d ~3x up, daily curves shifted to
|
||||
correct business days. Communicate before/after so the change isn't mistaken for a data incident.
|
||||
|
||||
## Reference: verification snippets used in the review
|
||||
|
||||
```sql
|
||||
-- MySQL: item-level discounts dropped by the gate (30d)
|
||||
SELECT COUNT(DISTINCT o.order_id), ROUND(SUM(odi.amount),2)
|
||||
FROM order_discount_items odi
|
||||
JOIN order_discounts od ON od.order_id=odi.order_id AND od.discount_id=odi.discount_id
|
||||
JOIN _order o ON o.order_id=odi.order_id
|
||||
WHERE odi.which=2 AND o.date_placed >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
|
||||
AND o.order_status >= 20 AND COALESCE(od.discount_amount_subtotal,0)=0;
|
||||
-- → 2,021 orders / $25,989 (2026-06-10)
|
||||
|
||||
-- MySQL: costeach fallback frequency (30d)
|
||||
SELECT COUNT(*),
|
||||
SUM(CASE WHEN NOT EXISTS (SELECT 1 FROM order_costs oc WHERE oc.orderid=oi.order_id
|
||||
AND oc.pid=oi.prod_pid AND oc.pending=0)
|
||||
AND NOT EXISTS (SELECT 1 FROM product_inventory pi WHERE pi.pid=oi.prod_pid)
|
||||
THEN 1 ELSE 0 END)
|
||||
FROM order_items oi JOIN _order o ON o.order_id=oi.order_id
|
||||
WHERE o.order_status >= 20 AND o.date_placed >= DATE_SUB(CURDATE(), INTERVAL 30 DAY);
|
||||
-- → 729 / 34,833 = 2.1% (2026-06-10)
|
||||
|
||||
-- PG: timezone check
|
||||
SHOW timezone; -- Europe/Berlin (2026-06-10)
|
||||
|
||||
-- Row drift, May 2026: MySQL 49,377 items / PG 49,841 (+0.9%)
|
||||
```
|
||||
@@ -76,7 +76,9 @@ $function$;
|
||||
|
||||
-- =============================================================================
|
||||
-- get_weighted_avg_cost: Weighted average cost from receivings up to a given date.
|
||||
-- Uses all non-canceled receivings (no row limit) weighted by quantity.
|
||||
-- Prefers receivings from the 365 days before p_date so decade-old costs don't
|
||||
-- weigh equally with recent ones; falls back to the lifetime average when the
|
||||
-- product had no receivings in that window.
|
||||
-- =============================================================================
|
||||
CREATE OR REPLACE FUNCTION public.get_weighted_avg_cost(
|
||||
p_pid bigint,
|
||||
@@ -97,8 +99,21 @@ BEGIN
|
||||
FROM receivings
|
||||
WHERE pid = p_pid
|
||||
AND received_date <= p_date
|
||||
AND received_date > p_date - INTERVAL '365 days'
|
||||
AND status != 'canceled';
|
||||
|
||||
IF weighted_cost IS NULL THEN
|
||||
SELECT
|
||||
CASE
|
||||
WHEN SUM(qty_each) > 0 THEN SUM(cost_each * qty_each) / SUM(qty_each)
|
||||
ELSE NULL
|
||||
END INTO weighted_cost
|
||||
FROM receivings
|
||||
WHERE pid = p_pid
|
||||
AND received_date <= p_date
|
||||
AND status != 'canceled';
|
||||
END IF;
|
||||
|
||||
RETURN weighted_cost;
|
||||
END;
|
||||
$function$;
|
||||
|
||||
@@ -76,6 +76,8 @@ if (process.env.DATABASE_URL && typeof process.env.DATABASE_URL === 'string') {
|
||||
dbConfig = {
|
||||
connectionString: process.env.DATABASE_URL,
|
||||
ssl: process.env.DB_SSL === 'true' ? { rejectUnauthorized: false } : false,
|
||||
// Required by cancelCalculation(): pg_cancel_backend targets this name
|
||||
application_name: 'node-metrics-calculator',
|
||||
// Add performance optimizations
|
||||
max: 10, // connection pool max size
|
||||
idleTimeoutMillis: 30000,
|
||||
@@ -93,6 +95,8 @@ if (process.env.DATABASE_URL && typeof process.env.DATABASE_URL === 'string') {
|
||||
database: process.env.DB_NAME,
|
||||
port: process.env.DB_PORT || 5432,
|
||||
ssl: process.env.DB_SSL === 'true',
|
||||
// Required by cancelCalculation(): pg_cancel_backend targets this name
|
||||
application_name: 'node-metrics-calculator',
|
||||
// Add performance optimizations
|
||||
max: 10, // connection pool max size
|
||||
idleTimeoutMillis: 30000,
|
||||
|
||||
Binary file not shown.
@@ -634,6 +634,52 @@ def forecast_from_curve(curve_params, scale_factor, age_days, horizon_days):
|
||||
return np.array(forecasts)
|
||||
|
||||
|
||||
def forecast_preorder(curve_params, scale_factor, days_until_arrival,
|
||||
preorder_daily_rate, horizon_days):
|
||||
"""
|
||||
Piecewise pre-order forecast: a flat observed pre-order trickle until the
|
||||
product is expected to arrive, then the scaled launch curve from age 0.
|
||||
|
||||
The launch curve was fit on POST-receipt order history, so running it from
|
||||
today (while the product is still weeks from arriving) front-loads full
|
||||
first-week launch volume that hasn't happened yet — the main driver of the
|
||||
~2.15x preorder over-forecast. Instead we forecast the slow pre-order rate
|
||||
up to the arrival date, then start the curve's day 0 on that date.
|
||||
See FORECAST_FIX_PLAN F4.
|
||||
|
||||
Args:
|
||||
curve_params: (amplitude, decay_rate, baseline, ...) weekly curve
|
||||
scale_factor: per-product multiplier for the post-arrival curve envelope
|
||||
days_until_arrival: calendar days from today until expected arrival
|
||||
preorder_daily_rate: observed pre-order units/day (trickle)
|
||||
horizon_days: forecast horizon length
|
||||
|
||||
Returns:
|
||||
array of daily forecast values of length horizon_days
|
||||
"""
|
||||
amplitude, decay_rate, baseline = curve_params[:3]
|
||||
forecasts = np.zeros(horizon_days)
|
||||
|
||||
# Clamp the arrival offset into the horizon
|
||||
dua = int(max(0, min(days_until_arrival, horizon_days)))
|
||||
|
||||
# Pre-arrival segment: flat pre-order trickle, capped at the curve's scaled
|
||||
# week-0 daily value (a pre-order day shouldn't out-sell the launch peak).
|
||||
if dua > 0:
|
||||
week0_daily = (amplitude / 7.0) * scale_factor + (baseline / 7.0)
|
||||
pre_rate = preorder_daily_rate
|
||||
if week0_daily > 0:
|
||||
pre_rate = min(pre_rate, week0_daily)
|
||||
forecasts[:dua] = max(0.0, pre_rate)
|
||||
|
||||
# Post-arrival segment: scaled launch curve, curve day 0 = arrival date.
|
||||
if dua < horizon_days:
|
||||
curve_part = forecast_from_curve(curve_params, scale_factor, 0, horizon_days - dua)
|
||||
forecasts[dua:] = curve_part
|
||||
|
||||
return forecasts
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Batch data loading (eliminates N+1 per-product queries)
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -651,9 +697,11 @@ def batch_load_product_data(conn, products):
|
||||
data = {
|
||||
'preorder_sales': {},
|
||||
'preorder_days': {},
|
||||
'preorder_arrival_days': {},
|
||||
'launch_sales': {},
|
||||
'decay_velocity': {},
|
||||
'mature_history': {},
|
||||
'dormant_rate': {},
|
||||
}
|
||||
|
||||
# Pre-order sales: orders placed BEFORE first received date
|
||||
@@ -677,6 +725,39 @@ def batch_load_product_data(conn, products):
|
||||
data['preorder_days'][int(row['pid'])] = float(row['preorder_days'])
|
||||
log.info(f"Batch loaded pre-order sales for {len(data['preorder_sales'])}/{len(preorder_pids)} preorder products")
|
||||
|
||||
# Expected arrival per pre-order product, to time the launch curve.
|
||||
# Prefer the soonest FUTURE expected_date on an open PO; if the only open
|
||||
# PO has a past expected_date assume 7 days; if there's no open PO at all
|
||||
# assume 14 days. See FORECAST_FIX_PLAN F4.
|
||||
arrival_sql = """
|
||||
SELECT pid,
|
||||
MIN(expected_date) FILTER (
|
||||
WHERE expected_date IS NOT NULL AND expected_date >= CURRENT_DATE
|
||||
) AS future_arrival
|
||||
FROM purchase_orders
|
||||
WHERE pid = ANY(%s)
|
||||
AND status IN ('created', 'ordered', 'electronically_sent', 'receiving_started')
|
||||
GROUP BY pid
|
||||
"""
|
||||
adf = execute_query(conn, arrival_sql, [preorder_pids])
|
||||
today = date.today()
|
||||
for _, row in adf.iterrows():
|
||||
pid = int(row['pid'])
|
||||
fa = row['future_arrival']
|
||||
if pd.notna(fa):
|
||||
fa_date = pd.Timestamp(fa).date()
|
||||
data['preorder_arrival_days'][pid] = max(0, (fa_date - today).days)
|
||||
else:
|
||||
data['preorder_arrival_days'][pid] = 7 # open PO, expected_date already past
|
||||
no_po = 0
|
||||
for pid in preorder_pids:
|
||||
if int(pid) not in data['preorder_arrival_days']:
|
||||
data['preorder_arrival_days'][int(pid)] = 14 # no open PO at all
|
||||
no_po += 1
|
||||
log.info(f"Batch loaded preorder arrival for "
|
||||
f"{len(data['preorder_arrival_days']) - no_po}/{len(preorder_pids)} via open POs, "
|
||||
f"{no_po} defaulted to 14d")
|
||||
|
||||
# Launch sales: first 14 days after first received
|
||||
launch_pids = products[products['phase'] == 'launch']['pid'].tolist()
|
||||
if launch_pids:
|
||||
@@ -694,15 +775,23 @@ def batch_load_product_data(conn, products):
|
||||
data['launch_sales'][int(row['pid'])] = float(row['total_sold'])
|
||||
log.info(f"Batch loaded launch sales for {len(data['launch_sales'])}/{len(launch_pids)} launch products")
|
||||
|
||||
# Decay recent velocity: average daily sales over last 30 days
|
||||
# Decay recent velocity: TRUE calendar-daily average over the last 30 days.
|
||||
# We divide the summed units by calendar days (clipped to the product's age),
|
||||
# NOT by the number of snapshot rows. Snapshots are sparse and mostly land on
|
||||
# sold-days, so AVG(units_sold) averages over sold-days only and inflated the
|
||||
# decay rate ~4x (measured 1.353 vs true 0.332 units/day). See FORECAST_FIX_PLAN F1.
|
||||
decay_pids = products[products['phase'] == 'decay']['pid'].tolist()
|
||||
if decay_pids:
|
||||
sql = """
|
||||
SELECT dps.pid, AVG(COALESCE(dps.units_sold, 0)) AS avg_daily
|
||||
SELECT dps.pid,
|
||||
SUM(COALESCE(dps.units_sold, 0))::float
|
||||
/ GREATEST(LEAST(30, (CURRENT_DATE - pm.date_first_received::date)), 1) AS avg_daily
|
||||
FROM daily_product_snapshots dps
|
||||
JOIN product_metrics pm ON pm.pid = dps.pid
|
||||
WHERE dps.pid = ANY(%s)
|
||||
AND dps.snapshot_date >= CURRENT_DATE - INTERVAL '30 days'
|
||||
GROUP BY dps.pid
|
||||
AND dps.snapshot_date >= pm.date_first_received::date
|
||||
GROUP BY dps.pid, pm.date_first_received
|
||||
"""
|
||||
df = execute_query(conn, sql, [decay_pids])
|
||||
for _, row in df.iterrows():
|
||||
@@ -724,6 +813,25 @@ def batch_load_product_data(conn, products):
|
||||
data['mature_history'][int(pid)] = group.copy()
|
||||
log.info(f"Batch loaded history for {len(data['mature_history'])}/{len(mature_pids)} mature products")
|
||||
|
||||
# Dormant trailing order rate: dormant products forecast 0 by default, but
|
||||
# ~11K of them still sell (restocks, promos, long-tail) — ~11% of all demand
|
||||
# currently forecast as a hard zero. Load a trailing-180-day daily order rate
|
||||
# so the dormant branch can carry a small positive rate. See FORECAST_FIX_PLAN F5.
|
||||
dormant_pids = products[products['phase'] == 'dormant']['pid'].tolist()
|
||||
if dormant_pids:
|
||||
sql = """
|
||||
SELECT o.pid, SUM(o.quantity) / 180.0 AS rate
|
||||
FROM orders o
|
||||
WHERE o.pid = ANY(%s)
|
||||
AND o.canceled IS DISTINCT FROM TRUE
|
||||
AND o.date >= CURRENT_DATE - INTERVAL '180 days'
|
||||
GROUP BY o.pid
|
||||
"""
|
||||
df = execute_query(conn, sql, [dormant_pids])
|
||||
for _, row in df.iterrows():
|
||||
data['dormant_rate'][int(row['pid'])] = float(row['rate'])
|
||||
log.info(f"Batch loaded dormant order rate for {len(data['dormant_rate'])}/{len(dormant_pids)} dormant products")
|
||||
|
||||
return data
|
||||
|
||||
|
||||
@@ -829,11 +937,20 @@ def forecast_mature(product, history_df):
|
||||
# Not enough data — flat velocity
|
||||
return np.full(FORECAST_HORIZON_DAYS, velocity)
|
||||
|
||||
# Fill date gaps with 0 sales (days where product had no snapshot = no sales)
|
||||
# Reindex over the FULL calendar window ending yesterday, not just the span
|
||||
# between the first and last snapshot. resample() only covers first→last
|
||||
# snapshot, so leading/trailing quiet periods are absent and the Holt level
|
||||
# is fitted only on the product's busy span (can run ~4x too high). An
|
||||
# explicit reindex fills every quiet calendar day with 0. (pid, snapshot_date)
|
||||
# is unique so there is no duplicate-index risk; do NOT use combine_first
|
||||
# (it keeps zeros over real data). See FORECAST_FIX_PLAN F2.
|
||||
hist = history_df.copy()
|
||||
hist['snapshot_date'] = pd.to_datetime(hist['snapshot_date'])
|
||||
hist = hist.set_index('snapshot_date').resample('D').sum().fillna(0)
|
||||
series = hist['units_sold'].values.astype(float)
|
||||
hist = hist.set_index('snapshot_date')['units_sold']
|
||||
full_index = pd.date_range(
|
||||
end=pd.Timestamp(date.today() - timedelta(days=1)),
|
||||
periods=EXP_SMOOTHING_WINDOW, freq='D')
|
||||
series = hist.reindex(full_index, fill_value=0.0).values.astype(float)
|
||||
|
||||
# Need at least 2 non-zero values for smoothing
|
||||
if np.count_nonzero(series) < 2:
|
||||
@@ -956,9 +1073,24 @@ def generate_all_forecasts(conn, curves_df, dow_indices, monthly_indices=None,
|
||||
today = date.today()
|
||||
forecast_dates = [today + timedelta(days=i) for i in range(FORECAST_HORIZON_DAYS)]
|
||||
|
||||
# Pre-compute DOW and seasonal multipliers for each forecast date
|
||||
# Pre-compute DOW and seasonal multipliers for each forecast date.
|
||||
# DOW multipliers stay ABSOLUTE — every calibration is a multi-week average
|
||||
# and therefore DOW-neutral, so reshaping by absolute DOW indices is correct.
|
||||
# Seasonal indices must be applied RELATIVE to the calibration period:
|
||||
# each per-product calibration (decay velocity, mature Holt level, launch /
|
||||
# preorder scale) is fitted on raw recent actuals that already embed the
|
||||
# current month's seasonal level. Multiplying by the absolute target-month
|
||||
# index double-counts seasonality (~25% over-forecast at the May→June sale
|
||||
# transition, worse near November). Divide by the trailing-30-day average
|
||||
# index so only the seasonal *change* from calibration to target applies.
|
||||
# See FORECAST_FIX_PLAN F3.
|
||||
dow_multipliers = [dow_indices.get(d.isoweekday(), 1.0) for d in forecast_dates]
|
||||
seasonal_multipliers = [monthly_indices.get(d.month, 1.0) for d in forecast_dates]
|
||||
trailing = [today - timedelta(days=i) for i in range(1, 31)]
|
||||
calibration_index = float(np.mean([monthly_indices.get(d.month, 1.0) for d in trailing]))
|
||||
seasonal_multipliers = [
|
||||
monthly_indices.get(d.month, 1.0) / max(calibration_index, 0.1)
|
||||
for d in forecast_dates
|
||||
]
|
||||
|
||||
# TRUNCATE before streaming writes
|
||||
with conn.cursor() as cur:
|
||||
@@ -1002,9 +1134,33 @@ def generate_all_forecasts(conn, curves_df, dow_indices, monthly_indices=None,
|
||||
try:
|
||||
curve_info = get_curve_for_product(product, curves_df)
|
||||
|
||||
if phase in ('preorder', 'launch'):
|
||||
if phase == 'preorder':
|
||||
if curve_info:
|
||||
scale = compute_scale_factor(phase, product, curve_info, batch_data)
|
||||
scale = compute_scale_factor('preorder', product, curve_info, batch_data)
|
||||
# Time the launch curve to expected arrival instead of
|
||||
# running it from today (F4). Pre-arrival days carry the
|
||||
# observed pre-order trickle rate.
|
||||
days_until_arrival = batch_data['preorder_arrival_days'].get(pid, 14)
|
||||
preorder_units = batch_data['preorder_sales'].get(pid, 0)
|
||||
preorder_days = batch_data['preorder_days'].get(pid, 1)
|
||||
preorder_daily_rate = preorder_units / max(preorder_days, 1)
|
||||
forecasts = forecast_preorder(
|
||||
curve_info, scale, days_until_arrival,
|
||||
preorder_daily_rate, FORECAST_HORIZON_DAYS)
|
||||
method = 'lifecycle_curve'
|
||||
else:
|
||||
# No reliable curve — fall back to velocity if available
|
||||
velocity = product.get('sales_velocity_daily') or 0
|
||||
if velocity > 0:
|
||||
forecasts = np.full(FORECAST_HORIZON_DAYS, velocity)
|
||||
method = 'velocity'
|
||||
else:
|
||||
forecasts = forecast_dormant()
|
||||
method = 'zero'
|
||||
|
||||
elif phase == 'launch':
|
||||
if curve_info:
|
||||
scale = compute_scale_factor('launch', product, curve_info, batch_data)
|
||||
forecasts = forecast_from_curve(curve_info, scale, age, FORECAST_HORIZON_DAYS)
|
||||
method = 'lifecycle_curve'
|
||||
else:
|
||||
@@ -1038,8 +1194,16 @@ def generate_all_forecasts(conn, curves_df, dow_indices, monthly_indices=None,
|
||||
method = 'velocity'
|
||||
|
||||
else: # dormant
|
||||
forecasts = forecast_dormant()
|
||||
method = 'zero'
|
||||
# Carry a small positive rate for dormant products that still
|
||||
# trickle sales (restocks/promos/long-tail); only truly dead
|
||||
# products stay at zero. See FORECAST_FIX_PLAN F5.
|
||||
rate = batch_data['dormant_rate'].get(pid, 0)
|
||||
if rate > 0:
|
||||
forecasts = np.full(FORECAST_HORIZON_DAYS, rate)
|
||||
method = 'velocity'
|
||||
else:
|
||||
forecasts = forecast_dormant()
|
||||
method = 'zero'
|
||||
|
||||
# Confidence interval: use accuracy-calibrated margins per phase
|
||||
base_margin = accuracy_margins.get(phase, 0.5)
|
||||
@@ -1108,6 +1272,8 @@ def archive_forecasts(conn, run_id):
|
||||
""")
|
||||
cur.execute("CREATE INDEX IF NOT EXISTS idx_pfh_date ON product_forecasts_history(forecast_date)")
|
||||
cur.execute("CREATE INDEX IF NOT EXISTS idx_pfh_pid_date ON product_forecasts_history(pid, forecast_date)")
|
||||
# Naive-baseline column for forecast value-added (FVA). See FORECAST_FIX_PLAN F8.
|
||||
cur.execute("ALTER TABLE product_forecasts_history ADD COLUMN IF NOT EXISTS naive_units NUMERIC(10,2)")
|
||||
|
||||
# Find the previous completed run (whose forecasts are still in product_forecasts)
|
||||
cur.execute("""
|
||||
@@ -1124,15 +1290,27 @@ def archive_forecasts(conn, run_id):
|
||||
|
||||
prev_run_id = prev_run[0]
|
||||
|
||||
# Archive only past-date forecasts (where actuals now exist)
|
||||
# Archive only past-date forecasts (where actuals now exist). Attach the
|
||||
# naive baseline (flat trailing-28-day daily average) at the same time so
|
||||
# forecast value-added can be measured. See FORECAST_FIX_PLAN F8.
|
||||
cur.execute("""
|
||||
INSERT INTO product_forecasts_history
|
||||
(run_id, pid, forecast_date, forecast_units, forecast_revenue,
|
||||
lifecycle_phase, forecast_method, confidence_lower, confidence_upper, generated_at)
|
||||
SELECT %s, pid, forecast_date, forecast_units, forecast_revenue,
|
||||
lifecycle_phase, forecast_method, confidence_lower, confidence_upper, generated_at
|
||||
FROM product_forecasts
|
||||
WHERE forecast_date < CURRENT_DATE
|
||||
lifecycle_phase, forecast_method, confidence_lower, confidence_upper,
|
||||
generated_at, naive_units)
|
||||
SELECT %s, pf.pid, pf.forecast_date, pf.forecast_units, pf.forecast_revenue,
|
||||
pf.lifecycle_phase, pf.forecast_method, pf.confidence_lower, pf.confidence_upper,
|
||||
pf.generated_at, COALESCE(nv.naive_daily, 0)
|
||||
FROM product_forecasts pf
|
||||
LEFT JOIN (
|
||||
SELECT o.pid, SUM(o.quantity) / 28.0 AS naive_daily
|
||||
FROM orders o
|
||||
WHERE o.canceled IS DISTINCT FROM TRUE
|
||||
AND o.date >= CURRENT_DATE - INTERVAL '28 days'
|
||||
AND o.date < CURRENT_DATE
|
||||
GROUP BY o.pid
|
||||
) nv ON nv.pid = pf.pid
|
||||
WHERE pf.forecast_date < CURRENT_DATE
|
||||
ON CONFLICT (run_id, pid, forecast_date) DO NOTHING
|
||||
""", (prev_run_id,))
|
||||
|
||||
@@ -1154,6 +1332,48 @@ def archive_forecasts(conn, run_id):
|
||||
return archived
|
||||
|
||||
|
||||
def archive_future_leads(conn, run_id):
|
||||
"""
|
||||
Archive a sampled set of FUTURE-lead forecasts from the just-generated
|
||||
product_forecasts, attributed to the current run.
|
||||
|
||||
The past-date archive in archive_forecasts() only ever captures the 1-day
|
||||
slice that just elapsed, so every accuracy sample lands in the '1-7d' lead
|
||||
bucket and the 15/30/60/90-day forecasts that purchasing actually rides on
|
||||
are never validated. Here we snapshot the 7/14/30/60/89-day-ahead leads
|
||||
(non-dormant) so that, once each date passes, compute_accuracy() can score
|
||||
them in their lead bucket. The naive baseline is attached the same way as in
|
||||
the past-date path. Future-dated rows survive the 90-day prune until their
|
||||
own date passes. See FORECAST_FIX_PLAN F7.
|
||||
"""
|
||||
with conn.cursor() as cur:
|
||||
cur.execute("""
|
||||
INSERT INTO product_forecasts_history
|
||||
(run_id, pid, forecast_date, forecast_units, forecast_revenue,
|
||||
lifecycle_phase, forecast_method, confidence_lower, confidence_upper,
|
||||
generated_at, naive_units)
|
||||
SELECT %s, pf.pid, pf.forecast_date, pf.forecast_units, pf.forecast_revenue,
|
||||
pf.lifecycle_phase, pf.forecast_method, pf.confidence_lower, pf.confidence_upper,
|
||||
pf.generated_at, COALESCE(nv.naive_daily, 0)
|
||||
FROM product_forecasts pf
|
||||
LEFT JOIN (
|
||||
SELECT o.pid, SUM(o.quantity) / 28.0 AS naive_daily
|
||||
FROM orders o
|
||||
WHERE o.canceled IS DISTINCT FROM TRUE
|
||||
AND o.date >= CURRENT_DATE - INTERVAL '28 days'
|
||||
AND o.date < CURRENT_DATE
|
||||
GROUP BY o.pid
|
||||
) nv ON nv.pid = pf.pid
|
||||
WHERE pf.lifecycle_phase != 'dormant'
|
||||
AND pf.forecast_date - CURRENT_DATE IN (7, 14, 30, 60, 89)
|
||||
ON CONFLICT (run_id, pid, forecast_date) DO NOTHING
|
||||
""", (run_id,))
|
||||
archived = cur.rowcount
|
||||
conn.commit()
|
||||
log.info(f"Archived {archived} future-lead forecast rows (7/14/30/60/89d) for run {run_id}")
|
||||
return archived
|
||||
|
||||
|
||||
def compute_accuracy(conn, run_id):
|
||||
"""
|
||||
Compute forecast accuracy metrics from archived history vs. actual sales.
|
||||
@@ -1162,11 +1382,18 @@ def compute_accuracy(conn, run_id):
|
||||
(pid, forecast_date = snapshot_date) to compare forecasted vs. actual units.
|
||||
|
||||
Stores results in forecast_accuracy table, broken down by:
|
||||
- overall: single aggregate row
|
||||
- overall: two rows — 'all' (non-dormant) and 'all_incl_dormant' (F5)
|
||||
- overall_weekly: per-product weekly-grain WMAPE — the informative headline
|
||||
for intermittent demand (daily grain has a ~190% floor) (F9)
|
||||
- by_phase: per lifecycle phase
|
||||
- by_lead_time: bucketed by how far ahead the forecast was
|
||||
- by_lead_time: bucketed by how far ahead the forecast was — long-lead
|
||||
buckets populate as the future-lead archives mature (F7)
|
||||
- by_method: per forecast method
|
||||
- daily: per forecast_date (for trend charts)
|
||||
|
||||
Every dimension also stores naive_wmape (flat trailing-28d baseline) and
|
||||
fva = 1 - wmape/naive_wmape, so the engine can be judged as value-over-naive
|
||||
(F8). Only realized dates (forecast_date < CURRENT_DATE) are scored.
|
||||
"""
|
||||
with conn.cursor() as cur:
|
||||
# Ensure accuracy table exists
|
||||
@@ -1186,6 +1413,10 @@ def compute_accuracy(conn, run_id):
|
||||
PRIMARY KEY (run_id, metric_type, dimension_value)
|
||||
)
|
||||
""")
|
||||
# Naive-baseline WMAPE and forecast value-added (FVA = 1 - wmape/naive_wmape).
|
||||
# See FORECAST_FIX_PLAN F8.
|
||||
cur.execute("ALTER TABLE forecast_accuracy ADD COLUMN IF NOT EXISTS naive_wmape NUMERIC(10,4)")
|
||||
cur.execute("ALTER TABLE forecast_accuracy ADD COLUMN IF NOT EXISTS fva NUMERIC(10,4)")
|
||||
conn.commit()
|
||||
|
||||
# Check if we have any history to analyze
|
||||
@@ -1195,124 +1426,199 @@ def compute_accuracy(conn, run_id):
|
||||
log.info("No forecast history available for accuracy computation")
|
||||
return
|
||||
|
||||
# For each (pid, forecast_date) pair, keep only the most recent run's
|
||||
# forecast row. This prevents double-counting when multiple runs have
|
||||
# archived forecasts for the same product×date combination.
|
||||
accuracy_cte = """
|
||||
WITH ranked_history AS (
|
||||
# Base CTEs (FORECAST_FIX_PLAN F7):
|
||||
# - Only score realized dates (forecast_date < CURRENT_DATE); future-lead
|
||||
# archives are excluded until their date passes.
|
||||
# - short_lead*: lead 0-6 deduped per (pid, forecast_date) — preserves the
|
||||
# meaning of the existing headline metrics. short_lead_eval keeps the
|
||||
# raw snapshot grid (incl. zero-zero days) for complete-week detection;
|
||||
# `accuracy` drops zero-zero days for daily-grain metrics.
|
||||
# - lead_dedup/lead_accuracy: deduped per (pid, forecast_date, lead_bucket)
|
||||
# so each long-lead bucket gets its own sample (the by_lead_time table).
|
||||
base_cte = """
|
||||
WITH ranked_all AS (
|
||||
SELECT
|
||||
pfh.*,
|
||||
pfh.pid, pfh.forecast_date, pfh.forecast_units, pfh.naive_units,
|
||||
pfh.lifecycle_phase, pfh.forecast_method,
|
||||
fr.started_at,
|
||||
ROW_NUMBER() OVER (
|
||||
PARTITION BY pfh.pid, pfh.forecast_date
|
||||
ORDER BY fr.started_at DESC
|
||||
) AS rn
|
||||
(pfh.forecast_date - fr.started_at::date) AS lead_days,
|
||||
CASE
|
||||
WHEN (pfh.forecast_date - fr.started_at::date) BETWEEN 0 AND 6 THEN '1-7d'
|
||||
WHEN (pfh.forecast_date - fr.started_at::date) BETWEEN 7 AND 13 THEN '8-14d'
|
||||
WHEN (pfh.forecast_date - fr.started_at::date) BETWEEN 14 AND 29 THEN '15-30d'
|
||||
WHEN (pfh.forecast_date - fr.started_at::date) BETWEEN 30 AND 59 THEN '31-60d'
|
||||
ELSE '61-90d'
|
||||
END AS lead_bucket
|
||||
FROM product_forecasts_history pfh
|
||||
JOIN forecast_runs fr ON fr.id = pfh.run_id
|
||||
WHERE pfh.forecast_date < CURRENT_DATE
|
||||
),
|
||||
short_lead AS (
|
||||
SELECT *,
|
||||
ROW_NUMBER() OVER (
|
||||
PARTITION BY pid, forecast_date ORDER BY started_at DESC
|
||||
) AS rn
|
||||
FROM ranked_all
|
||||
WHERE lead_days BETWEEN 0 AND 6
|
||||
),
|
||||
short_lead_eval AS (
|
||||
SELECT sl.pid, sl.lifecycle_phase, sl.forecast_method, sl.forecast_date,
|
||||
sl.forecast_units, sl.naive_units,
|
||||
COALESCE(dps.units_sold, 0) AS actual_units,
|
||||
(sl.forecast_units - COALESCE(dps.units_sold, 0)) AS error,
|
||||
ABS(sl.forecast_units - COALESCE(dps.units_sold, 0)) AS abs_error
|
||||
FROM short_lead sl
|
||||
LEFT JOIN daily_product_snapshots dps
|
||||
ON dps.pid = sl.pid AND dps.snapshot_date = sl.forecast_date
|
||||
WHERE sl.rn = 1
|
||||
),
|
||||
accuracy AS (
|
||||
SELECT
|
||||
rh.lifecycle_phase,
|
||||
rh.forecast_method,
|
||||
rh.forecast_date,
|
||||
(rh.forecast_date - rh.started_at::date) AS lead_days,
|
||||
rh.forecast_units,
|
||||
SELECT * FROM short_lead_eval
|
||||
WHERE NOT (forecast_units = 0 AND actual_units = 0)
|
||||
),
|
||||
lead_dedup AS (
|
||||
SELECT *,
|
||||
ROW_NUMBER() OVER (
|
||||
PARTITION BY pid, forecast_date, lead_bucket ORDER BY started_at DESC
|
||||
) AS rn
|
||||
FROM ranked_all
|
||||
),
|
||||
lead_accuracy AS (
|
||||
SELECT ld.lead_bucket, ld.forecast_units, ld.naive_units,
|
||||
COALESCE(dps.units_sold, 0) AS actual_units,
|
||||
(rh.forecast_units - COALESCE(dps.units_sold, 0)) AS error,
|
||||
ABS(rh.forecast_units - COALESCE(dps.units_sold, 0)) AS abs_error
|
||||
FROM ranked_history rh
|
||||
(ld.forecast_units - COALESCE(dps.units_sold, 0)) AS error,
|
||||
ABS(ld.forecast_units - COALESCE(dps.units_sold, 0)) AS abs_error
|
||||
FROM lead_dedup ld
|
||||
LEFT JOIN daily_product_snapshots dps
|
||||
ON dps.pid = rh.pid AND dps.snapshot_date = rh.forecast_date
|
||||
WHERE rh.rn = 1
|
||||
AND NOT (rh.forecast_units = 0 AND COALESCE(dps.units_sold, 0) = 0)
|
||||
ON dps.pid = ld.pid AND dps.snapshot_date = ld.forecast_date
|
||||
WHERE ld.rn = 1
|
||||
AND ld.lifecycle_phase != 'dormant'
|
||||
AND NOT (ld.forecast_units = 0 AND COALESCE(dps.units_sold, 0) = 0)
|
||||
)
|
||||
"""
|
||||
|
||||
# Compute and insert metrics for each dimension
|
||||
dimensions = {
|
||||
'overall': "SELECT 'all' AS dim",
|
||||
'by_phase': "SELECT DISTINCT lifecycle_phase AS dim FROM accuracy",
|
||||
'by_lead_time': """
|
||||
SELECT DISTINCT
|
||||
CASE
|
||||
WHEN lead_days BETWEEN 0 AND 6 THEN '1-7d'
|
||||
WHEN lead_days BETWEEN 7 AND 13 THEN '8-14d'
|
||||
WHEN lead_days BETWEEN 14 AND 29 THEN '15-30d'
|
||||
WHEN lead_days BETWEEN 30 AND 59 THEN '31-60d'
|
||||
ELSE '61-90d'
|
||||
END AS dim
|
||||
FROM accuracy
|
||||
""",
|
||||
'by_method': "SELECT DISTINCT forecast_method AS dim FROM accuracy",
|
||||
'daily': "SELECT DISTINCT forecast_date::text AS dim FROM accuracy",
|
||||
}
|
||||
|
||||
filter_clauses = {
|
||||
'overall': "lifecycle_phase != 'dormant'",
|
||||
'by_phase': "lifecycle_phase = dims.dim",
|
||||
'by_lead_time': """
|
||||
CASE
|
||||
WHEN lead_days BETWEEN 0 AND 6 THEN '1-7d'
|
||||
WHEN lead_days BETWEEN 7 AND 13 THEN '8-14d'
|
||||
WHEN lead_days BETWEEN 14 AND 29 THEN '15-30d'
|
||||
WHEN lead_days BETWEEN 30 AND 59 THEN '31-60d'
|
||||
ELSE '61-90d'
|
||||
END = dims.dim
|
||||
""",
|
||||
'by_method': "forecast_method = dims.dim",
|
||||
'daily': "forecast_date::text = dims.dim",
|
||||
}
|
||||
|
||||
total_inserted = 0
|
||||
|
||||
for metric_type, dim_query in dimensions.items():
|
||||
filter_clause = filter_clauses[metric_type]
|
||||
|
||||
sql = f"""
|
||||
{accuracy_cte},
|
||||
dims AS ({dim_query})
|
||||
# Daily-grain aggregate over a source CTE aliased `a`, computing the
|
||||
# engine WMAPE plus the naive-baseline WMAPE (NULL-safe: rows archived
|
||||
# before F8 have naive_units NULL and are excluded from the naive sums).
|
||||
def daily_agg(dim_expr, source, where=None, group_by=None):
|
||||
where_sql = f"WHERE {where}" if where else ""
|
||||
group_sql = f"GROUP BY {group_by}" if group_by else ""
|
||||
return f"""
|
||||
SELECT
|
||||
dims.dim,
|
||||
{dim_expr} AS dim,
|
||||
COUNT(*) AS sample_size,
|
||||
COALESCE(SUM(a.actual_units), 0) AS total_actual,
|
||||
COALESCE(SUM(a.forecast_units), 0) AS total_forecast,
|
||||
AVG(a.abs_error) AS mae,
|
||||
CASE WHEN SUM(a.actual_units) > 0
|
||||
THEN SUM(a.abs_error) / SUM(a.actual_units)
|
||||
ELSE NULL END AS wmape,
|
||||
THEN SUM(a.abs_error) / SUM(a.actual_units) ELSE NULL END AS wmape,
|
||||
AVG(a.error) AS bias,
|
||||
SQRT(AVG(POWER(a.error, 2))) AS rmse
|
||||
FROM dims
|
||||
CROSS JOIN accuracy a
|
||||
WHERE {filter_clause}
|
||||
GROUP BY dims.dim
|
||||
SQRT(AVG(POWER(a.error, 2))) AS rmse,
|
||||
CASE WHEN SUM(a.actual_units) FILTER (WHERE a.naive_units IS NOT NULL) > 0
|
||||
THEN SUM(ABS(a.naive_units - a.actual_units)) FILTER (WHERE a.naive_units IS NOT NULL)
|
||||
/ SUM(a.actual_units) FILTER (WHERE a.naive_units IS NOT NULL)
|
||||
ELSE NULL END AS naive_wmape
|
||||
FROM {source} a
|
||||
{where_sql}
|
||||
{group_sql}
|
||||
"""
|
||||
|
||||
cur.execute(sql)
|
||||
rows = cur.fetchall()
|
||||
insert_sql = """
|
||||
INSERT INTO forecast_accuracy
|
||||
(run_id, metric_type, dimension_value, sample_size,
|
||||
total_actual_units, total_forecast_units, mae, wmape, bias, rmse,
|
||||
naive_wmape, fva)
|
||||
VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
|
||||
ON CONFLICT (run_id, metric_type, dimension_value)
|
||||
DO UPDATE SET
|
||||
sample_size = EXCLUDED.sample_size,
|
||||
total_actual_units = EXCLUDED.total_actual_units,
|
||||
total_forecast_units = EXCLUDED.total_forecast_units,
|
||||
mae = EXCLUDED.mae, wmape = EXCLUDED.wmape,
|
||||
bias = EXCLUDED.bias, rmse = EXCLUDED.rmse,
|
||||
naive_wmape = EXCLUDED.naive_wmape, fva = EXCLUDED.fva,
|
||||
computed_at = NOW()
|
||||
"""
|
||||
|
||||
for row in rows:
|
||||
dim_val, sample_size, total_actual, total_forecast, mae, wmape, bias, rmse = row
|
||||
cur.execute("""
|
||||
INSERT INTO forecast_accuracy
|
||||
(run_id, metric_type, dimension_value, sample_size,
|
||||
total_actual_units, total_forecast_units, mae, wmape, bias, rmse)
|
||||
VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
|
||||
ON CONFLICT (run_id, metric_type, dimension_value)
|
||||
DO UPDATE SET
|
||||
sample_size = EXCLUDED.sample_size,
|
||||
total_actual_units = EXCLUDED.total_actual_units,
|
||||
total_forecast_units = EXCLUDED.total_forecast_units,
|
||||
mae = EXCLUDED.mae, wmape = EXCLUDED.wmape,
|
||||
bias = EXCLUDED.bias, rmse = EXCLUDED.rmse,
|
||||
computed_at = NOW()
|
||||
""", (run_id, metric_type, dim_val, sample_size,
|
||||
float(total_actual), float(total_forecast),
|
||||
float(mae) if mae is not None else None,
|
||||
float(wmape) if wmape is not None else None,
|
||||
float(bias) if bias is not None else None,
|
||||
float(rmse) if rmse is not None else None))
|
||||
total_inserted += 1
|
||||
def _f(x):
|
||||
return float(x) if x is not None else None
|
||||
|
||||
def run_and_insert(metric_type, sql):
|
||||
cur.execute(base_cte + sql)
|
||||
n = 0
|
||||
for row in cur.fetchall():
|
||||
(dim_val, sample_size, total_actual, total_forecast,
|
||||
mae, wmape, bias, rmse, naive_wmape) = row
|
||||
fva = None
|
||||
if wmape is not None and naive_wmape is not None and float(naive_wmape) > 0:
|
||||
fva = 1.0 - float(wmape) / float(naive_wmape)
|
||||
cur.execute(insert_sql, (
|
||||
run_id, metric_type, dim_val, sample_size,
|
||||
_f(total_actual), _f(total_forecast), _f(mae), _f(wmape),
|
||||
_f(bias), _f(rmse), _f(naive_wmape), _f(fva)))
|
||||
n += 1
|
||||
return n
|
||||
|
||||
total_inserted = 0
|
||||
|
||||
# overall: two rows — 'all' (non-dormant, the headline) and
|
||||
# 'all_incl_dormant' (everything, so the ~11% dormant demand stops being
|
||||
# invisible). Both are short-lead (lead 0-6). F5.
|
||||
overall_source = """(
|
||||
SELECT a.*, 'all'::text AS dim FROM accuracy a WHERE a.lifecycle_phase != 'dormant'
|
||||
UNION ALL
|
||||
SELECT a.*, 'all_incl_dormant'::text AS dim FROM accuracy a
|
||||
)"""
|
||||
total_inserted += run_and_insert('overall',
|
||||
daily_agg('a.dim', overall_source, group_by='a.dim'))
|
||||
|
||||
# by_phase / by_method / daily — short-lead daily-grain over `accuracy`.
|
||||
total_inserted += run_and_insert('by_phase',
|
||||
daily_agg('a.lifecycle_phase', 'accuracy', group_by='a.lifecycle_phase'))
|
||||
total_inserted += run_and_insert('by_method',
|
||||
daily_agg('a.forecast_method', 'accuracy', group_by='a.forecast_method'))
|
||||
total_inserted += run_and_insert('daily',
|
||||
daily_agg('a.forecast_date::text', 'accuracy',
|
||||
where="a.lifecycle_phase != 'dormant'", group_by='a.forecast_date'))
|
||||
|
||||
# by_lead_time — one sample per (pid, date, lead bucket) over `lead_accuracy`.
|
||||
# Buckets beyond '1-7d' populate as the future-lead archives (F7) mature.
|
||||
total_inserted += run_and_insert('by_lead_time',
|
||||
daily_agg('a.lead_bucket', 'lead_accuracy', group_by='a.lead_bucket'))
|
||||
|
||||
# overall_weekly — the informative headline for intermittent retail demand.
|
||||
# Aggregate the short-lead rows to (pid, complete week), then WMAPE over
|
||||
# pid-weeks. Daily-grain WMAPE has a ~190% floor on this catalog; weekly
|
||||
# grain is ~109% and responds to real improvement. F9.
|
||||
weekly_sql = """,
|
||||
weekly AS (
|
||||
SELECT pid, date_trunc('week', forecast_date) AS wk,
|
||||
SUM(forecast_units) AS fc_week,
|
||||
SUM(actual_units) AS act_week,
|
||||
SUM(naive_units) AS naive_week,
|
||||
bool_and(naive_units IS NOT NULL) AS naive_complete
|
||||
FROM short_lead_eval
|
||||
WHERE lifecycle_phase != 'dormant'
|
||||
GROUP BY pid, date_trunc('week', forecast_date)
|
||||
HAVING COUNT(*) = 7
|
||||
)
|
||||
SELECT 'all'::text AS dim,
|
||||
COUNT(*) AS sample_size,
|
||||
COALESCE(SUM(act_week), 0) AS total_actual,
|
||||
COALESCE(SUM(fc_week), 0) AS total_forecast,
|
||||
AVG(ABS(fc_week - act_week)) AS mae,
|
||||
CASE WHEN SUM(act_week) > 0
|
||||
THEN SUM(ABS(fc_week - act_week)) / SUM(act_week) ELSE NULL END AS wmape,
|
||||
AVG(fc_week - act_week) AS bias,
|
||||
SQRT(AVG(POWER(fc_week - act_week, 2))) AS rmse,
|
||||
CASE WHEN SUM(act_week) FILTER (WHERE naive_complete) > 0
|
||||
THEN SUM(ABS(naive_week - act_week)) FILTER (WHERE naive_complete)
|
||||
/ SUM(act_week) FILTER (WHERE naive_complete)
|
||||
ELSE NULL END AS naive_wmape
|
||||
FROM weekly
|
||||
WHERE NOT (fc_week = 0 AND act_week = 0)
|
||||
"""
|
||||
total_inserted += run_and_insert('overall_weekly', weekly_sql)
|
||||
|
||||
conn.commit()
|
||||
|
||||
@@ -1562,6 +1868,10 @@ def main():
|
||||
conn, curves_df, dow_indices, monthly_indices, accuracy_margins
|
||||
)
|
||||
|
||||
# Phase 4b: Snapshot sampled future-lead forecasts (7/14/30/60/89d) from
|
||||
# the fresh run so long-lead accuracy populates once those dates pass (F7).
|
||||
archive_future_leads(conn, run_id)
|
||||
|
||||
duration = time.time() - start_time
|
||||
|
||||
# Record run completion (include DOW indices in metadata)
|
||||
|
||||
@@ -1,6 +1,12 @@
|
||||
const path = require('path');
|
||||
const fs = require('fs');
|
||||
const { spawn } = require('child_process');
|
||||
|
||||
// Maintenance switch: `touch .pause-auto-update` in inventory-server/ to make the
|
||||
// recurring full-update a no-op (e.g. during a long manual full re-import or a
|
||||
// snapshot rebuild). Remove the file to resume.
|
||||
const PAUSE_FILE = path.join(__dirname, '..', '.pause-auto-update');
|
||||
|
||||
function outputProgress(data) {
|
||||
if (!data.status) {
|
||||
data = {
|
||||
@@ -22,12 +28,8 @@ function runScript(scriptPath) {
|
||||
child.stdout.on('data', (data) => {
|
||||
const lines = data.toString().split('\n');
|
||||
lines.filter(line => line.trim()).forEach(line => {
|
||||
try {
|
||||
console.log(line); // Pass through the JSON output
|
||||
output += line + '\n';
|
||||
} catch (e) {
|
||||
console.log(line); // If not JSON, just log it directly
|
||||
}
|
||||
console.log(line); // Pass through the (usually JSON) output
|
||||
output += line + '\n';
|
||||
});
|
||||
});
|
||||
|
||||
@@ -50,6 +52,14 @@ function runScript(scriptPath) {
|
||||
}
|
||||
|
||||
async function fullUpdate() {
|
||||
if (fs.existsSync(PAUSE_FILE)) {
|
||||
outputProgress({
|
||||
status: 'complete',
|
||||
operation: 'Full update skipped',
|
||||
message: `Auto-update is paused (${PAUSE_FILE} exists) — remove the file to resume`
|
||||
});
|
||||
return;
|
||||
}
|
||||
try {
|
||||
// Step 1: Import from Production
|
||||
outputProgress({
|
||||
|
||||
@@ -13,10 +13,14 @@ async function importCategories(prodConnection, localConnection) {
|
||||
let skippedCategories = [];
|
||||
|
||||
try {
|
||||
// Start a single transaction for the entire import
|
||||
await localConnection.query('BEGIN');
|
||||
|
||||
// Temporarily disable the trigger that's causing problems
|
||||
// Start a single transaction for the entire import.
|
||||
// Must use the wrapper's beginTransaction() (dedicated client) — query('BEGIN')
|
||||
// checks out a client per call, so BEGIN/work/COMMIT would not be guaranteed
|
||||
// to share a connection.
|
||||
await localConnection.beginTransaction();
|
||||
|
||||
// Temporarily disable the trigger that's causing problems.
|
||||
// ALTER TABLE ... DISABLE TRIGGER is transactional: a rollback restores it.
|
||||
await localConnection.query('ALTER TABLE categories DISABLE TRIGGER update_categories_updated_at');
|
||||
|
||||
// Process each type in order with its own savepoint
|
||||
@@ -148,8 +152,11 @@ async function importCategories(prodConnection, localConnection) {
|
||||
}
|
||||
}
|
||||
|
||||
// Re-enable the trigger INSIDE the transaction so disable/enable are atomic
|
||||
await localConnection.query('ALTER TABLE categories ENABLE TRIGGER update_categories_updated_at');
|
||||
|
||||
// Commit the entire transaction - we'll do this even if we have skipped categories
|
||||
await localConnection.query('COMMIT');
|
||||
await localConnection.commit();
|
||||
|
||||
// Update sync status
|
||||
await localConnection.query(`
|
||||
@@ -158,9 +165,6 @@ async function importCategories(prodConnection, localConnection) {
|
||||
ON CONFLICT (table_name) DO UPDATE SET
|
||||
last_sync_timestamp = NOW()
|
||||
`);
|
||||
|
||||
// Re-enable the trigger
|
||||
await localConnection.query('ALTER TABLE categories ENABLE TRIGGER update_categories_updated_at');
|
||||
|
||||
outputProgress({
|
||||
status: "complete",
|
||||
@@ -187,12 +191,10 @@ async function importCategories(prodConnection, localConnection) {
|
||||
} catch (error) {
|
||||
console.error("Error importing categories:", error);
|
||||
|
||||
// Only rollback if we haven't committed yet
|
||||
// Only rollback if we haven't committed yet. The rollback also restores the
|
||||
// trigger state (DISABLE TRIGGER was inside the transaction).
|
||||
try {
|
||||
await localConnection.query('ROLLBACK');
|
||||
|
||||
// Make sure we re-enable the trigger even if there was an error
|
||||
await localConnection.query('ALTER TABLE categories ENABLE TRIGGER update_categories_updated_at');
|
||||
await localConnection.rollback();
|
||||
} catch (rollbackError) {
|
||||
console.error("Error during rollback:", rollbackError);
|
||||
}
|
||||
|
||||
@@ -24,7 +24,8 @@ async function importDailyDeals(prodConnection, localConnection) {
|
||||
const startTime = Date.now();
|
||||
|
||||
try {
|
||||
await localConnection.query('BEGIN');
|
||||
// Wrapper's beginTransaction() pins a dedicated client; query('BEGIN') would not.
|
||||
await localConnection.beginTransaction();
|
||||
|
||||
// Fetch recent daily deals from production (MySQL 5.7, no CTEs)
|
||||
// Join product_current_prices to get the actual deal price
|
||||
@@ -127,7 +128,7 @@ async function importDailyDeals(prodConnection, localConnection) {
|
||||
last_sync_timestamp = NOW()
|
||||
`);
|
||||
|
||||
await localConnection.query('COMMIT');
|
||||
await localConnection.commit();
|
||||
|
||||
outputProgress({
|
||||
status: "complete",
|
||||
@@ -149,7 +150,7 @@ async function importDailyDeals(prodConnection, localConnection) {
|
||||
console.error("Error importing daily deals:", error);
|
||||
|
||||
try {
|
||||
await localConnection.query('ROLLBACK');
|
||||
await localConnection.rollback();
|
||||
} catch (rollbackError) {
|
||||
console.error("Error during rollback:", rollbackError);
|
||||
}
|
||||
|
||||
@@ -1,5 +1,4 @@
|
||||
const { outputProgress, formatElapsedTime, estimateRemaining, calculateRate } = require('../metrics-new/utils/progress');
|
||||
const { importMissingProducts, setupTemporaryTables, cleanupTemporaryTables, materializeCalculations } = require('./products');
|
||||
|
||||
/**
|
||||
* Imports orders from a production MySQL database to a local PostgreSQL database.
|
||||
@@ -28,6 +27,7 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
|
||||
22: 'placed_incomplete',
|
||||
30: 'canceled',
|
||||
40: 'awaiting_payment',
|
||||
45: 'payment_pending',
|
||||
50: 'awaiting_products',
|
||||
55: 'shipping_later',
|
||||
56: 'shipping_together',
|
||||
@@ -35,6 +35,7 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
|
||||
61: 'flagged',
|
||||
62: 'fix_before_pick',
|
||||
65: 'manual_picking',
|
||||
67: 'remote_send',
|
||||
70: 'in_pt',
|
||||
80: 'picked',
|
||||
90: 'awaiting_shipment',
|
||||
@@ -65,6 +66,12 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
|
||||
|
||||
console.log('Orders: Using last sync time:', lastSyncTime, '(adjusted:', mysqlSyncTime, ')');
|
||||
|
||||
// Capture the next watermark from MySQL's own clock BEFORE querying any data.
|
||||
// Rows modified while the import runs stay above this watermark for the next
|
||||
// incremental run (overlap re-imports are harmless upserts); writing NOW()
|
||||
// after the import finishes would permanently skip them.
|
||||
const [[{ source_now: sourceNow }]] = await prodConnection.query('SELECT NOW() as source_now');
|
||||
|
||||
// First get count of order items - Keep MySQL compatible for production
|
||||
const [[{ total }]] = await prodConnection.query(`
|
||||
SELECT COUNT(*) as total
|
||||
@@ -100,7 +107,6 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
|
||||
COALESCE(NULLIF(TRIM(oi.prod_itemnumber), ''), 'NO-SKU') as SKU,
|
||||
oi.prod_price as price,
|
||||
oi.qty_ordered as quantity,
|
||||
COALESCE(oi.prod_price_reg - oi.prod_price, 0) as base_discount,
|
||||
oi.stamp as last_modified
|
||||
FROM order_items oi
|
||||
JOIN _order o ON oi.order_id = o.order_id
|
||||
@@ -131,10 +137,8 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
|
||||
await localConnection.query(`
|
||||
DROP TABLE IF EXISTS temp_order_items;
|
||||
DROP TABLE IF EXISTS temp_order_meta;
|
||||
DROP TABLE IF EXISTS temp_order_discounts;
|
||||
DROP TABLE IF EXISTS temp_order_taxes;
|
||||
DROP TABLE IF EXISTS temp_order_costs;
|
||||
DROP TABLE IF EXISTS temp_main_discounts;
|
||||
DROP TABLE IF EXISTS temp_item_discounts;
|
||||
|
||||
CREATE TEMP TABLE temp_order_items (
|
||||
@@ -143,7 +147,6 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
|
||||
sku TEXT NOT NULL,
|
||||
price NUMERIC(14, 4) NOT NULL,
|
||||
quantity INTEGER NOT NULL,
|
||||
base_discount NUMERIC(14, 4) DEFAULT 0,
|
||||
PRIMARY KEY (order_id, pid)
|
||||
);
|
||||
|
||||
@@ -160,20 +163,6 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
|
||||
PRIMARY KEY (order_id)
|
||||
);
|
||||
|
||||
CREATE TEMP TABLE temp_order_discounts (
|
||||
order_id INTEGER NOT NULL,
|
||||
pid INTEGER NOT NULL,
|
||||
discount NUMERIC(14, 4) NOT NULL,
|
||||
PRIMARY KEY (order_id, pid)
|
||||
);
|
||||
|
||||
CREATE TEMP TABLE temp_main_discounts (
|
||||
order_id INTEGER NOT NULL,
|
||||
discount_id INTEGER NOT NULL,
|
||||
discount_amount_subtotal NUMERIC(14, 4) DEFAULT 0.0000,
|
||||
PRIMARY KEY (order_id, discount_id)
|
||||
);
|
||||
|
||||
CREATE TEMP TABLE temp_item_discounts (
|
||||
order_id INTEGER NOT NULL,
|
||||
pid INTEGER NOT NULL,
|
||||
@@ -198,10 +187,8 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
|
||||
|
||||
CREATE INDEX idx_temp_order_items_pid ON temp_order_items(pid);
|
||||
CREATE INDEX idx_temp_order_meta_order_id ON temp_order_meta(order_id);
|
||||
CREATE INDEX idx_temp_order_discounts_order_pid ON temp_order_discounts(order_id, pid);
|
||||
CREATE INDEX idx_temp_order_taxes_order_pid ON temp_order_taxes(order_id, pid);
|
||||
CREATE INDEX idx_temp_order_costs_order_pid ON temp_order_costs(order_id, pid);
|
||||
CREATE INDEX idx_temp_main_discounts_discount_id ON temp_main_discounts(discount_id);
|
||||
CREATE INDEX idx_temp_item_discounts_order_pid ON temp_item_discounts(order_id, pid);
|
||||
CREATE INDEX idx_temp_item_discounts_discount_id ON temp_item_discounts(discount_id);
|
||||
`);
|
||||
@@ -216,21 +203,20 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
|
||||
await localConnection.beginTransaction();
|
||||
try {
|
||||
const batch = orderItems.slice(i, Math.min(i + 5000, orderItems.length));
|
||||
const placeholders = batch.map((_, idx) =>
|
||||
`($${idx * 6 + 1}, $${idx * 6 + 2}, $${idx * 6 + 3}, $${idx * 6 + 4}, $${idx * 6 + 5}, $${idx * 6 + 6})`
|
||||
const placeholders = batch.map((_, idx) =>
|
||||
`($${idx * 5 + 1}, $${idx * 5 + 2}, $${idx * 5 + 3}, $${idx * 5 + 4}, $${idx * 5 + 5})`
|
||||
).join(",");
|
||||
const values = batch.flatMap(item => [
|
||||
item.order_id, item.prod_pid, item.SKU, item.price, item.quantity, item.base_discount
|
||||
item.order_id, item.prod_pid, item.SKU, item.price, item.quantity
|
||||
]);
|
||||
|
||||
await localConnection.query(`
|
||||
INSERT INTO temp_order_items (order_id, pid, sku, price, quantity, base_discount)
|
||||
INSERT INTO temp_order_items (order_id, pid, sku, price, quantity)
|
||||
VALUES ${placeholders}
|
||||
ON CONFLICT (order_id, pid) DO UPDATE SET
|
||||
sku = EXCLUDED.sku,
|
||||
price = EXCLUDED.price,
|
||||
quantity = EXCLUDED.quantity,
|
||||
base_discount = EXCLUDED.base_discount
|
||||
quantity = EXCLUDED.quantity
|
||||
`, values);
|
||||
|
||||
await localConnection.commit();
|
||||
@@ -337,49 +323,15 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
|
||||
};
|
||||
|
||||
const processDiscountsBatch = async (batchIds) => {
|
||||
// First, load main discount records
|
||||
const [mainDiscounts] = await prodConnection.query(`
|
||||
SELECT order_id, discount_id, discount_amount_subtotal
|
||||
FROM order_discounts
|
||||
WHERE order_id IN (?)
|
||||
`, [batchIds]);
|
||||
|
||||
if (mainDiscounts.length > 0) {
|
||||
await localConnection.beginTransaction();
|
||||
try {
|
||||
for (let j = 0; j < mainDiscounts.length; j += PG_BATCH_SIZE) {
|
||||
const subBatch = mainDiscounts.slice(j, j + PG_BATCH_SIZE);
|
||||
if (subBatch.length === 0) continue;
|
||||
|
||||
const placeholders = subBatch.map((_, idx) =>
|
||||
`($${idx * 3 + 1}, $${idx * 3 + 2}, $${idx * 3 + 3})`
|
||||
).join(",");
|
||||
|
||||
const values = subBatch.flatMap(d => [
|
||||
d.order_id,
|
||||
d.discount_id,
|
||||
d.discount_amount_subtotal || 0
|
||||
]);
|
||||
|
||||
await localConnection.query(`
|
||||
INSERT INTO temp_main_discounts (order_id, discount_id, discount_amount_subtotal)
|
||||
VALUES ${placeholders}
|
||||
ON CONFLICT (order_id, discount_id) DO UPDATE SET
|
||||
discount_amount_subtotal = EXCLUDED.discount_amount_subtotal
|
||||
`, values);
|
||||
}
|
||||
await localConnection.commit();
|
||||
} catch (error) {
|
||||
await localConnection.rollback();
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
// Then, load item discount records
|
||||
// Load item-level discount records. Only which = 2 rows are real per-item
|
||||
// discount amounts; which = 1 rows store the price of free promo-added
|
||||
// items and which = 3 rows are usage records (neither is a discount).
|
||||
// These amounts are NOT included in summary_discount_subtotal, so they
|
||||
// must be added on top of the prorated subtotal discount unconditionally.
|
||||
const [discounts] = await prodConnection.query(`
|
||||
SELECT order_id, pid, discount_id, amount
|
||||
FROM order_discount_items
|
||||
WHERE order_id IN (?)
|
||||
WHERE order_id IN (?) AND which = 2
|
||||
`, [batchIds]);
|
||||
|
||||
if (discounts.length === 0) return;
|
||||
@@ -418,16 +370,6 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
|
||||
`, values);
|
||||
}
|
||||
|
||||
// Create aggregated view with a simpler, safer query that avoids duplicates
|
||||
await localConnection.query(`
|
||||
TRUNCATE temp_order_discounts;
|
||||
|
||||
INSERT INTO temp_order_discounts (order_id, pid, discount)
|
||||
SELECT order_id, pid, SUM(amount) as discount
|
||||
FROM temp_item_discounts
|
||||
GROUP BY order_id, pid
|
||||
`);
|
||||
|
||||
await localConnection.commit();
|
||||
} catch (error) {
|
||||
await localConnection.rollback();
|
||||
@@ -603,42 +545,54 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
|
||||
try {
|
||||
const [orders] = await localConnection.query(`
|
||||
WITH order_totals AS (
|
||||
SELECT
|
||||
SELECT
|
||||
oi.order_id,
|
||||
oi.pid,
|
||||
-- Instead of using ARRAY_AGG which can cause duplicate issues, use SUM with a CASE
|
||||
SUM(CASE
|
||||
WHEN COALESCE(md.discount_amount_subtotal, 0) > 0 THEN id.amount
|
||||
ELSE 0
|
||||
END) as promo_discount_sum,
|
||||
-- Item-level promo discounts (which = 2 rows). These live outside
|
||||
-- summary_discount_subtotal, so they are summed unconditionally.
|
||||
SUM(COALESCE(id.amount, 0)) as promo_discount_sum,
|
||||
COALESCE(ot.tax, 0) as total_tax,
|
||||
COALESCE(oc.costeach, pc.cost_price, oi.price * 0.5) as costeach
|
||||
FROM temp_order_items oi
|
||||
LEFT JOIN temp_item_discounts id ON oi.order_id = id.order_id AND oi.pid = id.pid
|
||||
LEFT JOIN temp_main_discounts md ON id.order_id = md.order_id AND id.discount_id = md.discount_id
|
||||
LEFT JOIN temp_order_taxes ot ON oi.order_id = ot.order_id AND oi.pid = ot.pid
|
||||
LEFT JOIN temp_order_costs oc ON oi.order_id = oc.order_id AND oi.pid = oc.pid
|
||||
LEFT JOIN temp_product_costs pc ON oi.pid = pc.pid
|
||||
WHERE oi.order_id = ANY($1)
|
||||
GROUP BY oi.order_id, oi.pid, ot.tax, oc.costeach, pc.cost_price
|
||||
)
|
||||
SELECT
|
||||
SELECT
|
||||
oi.order_id as order_number,
|
||||
oi.pid::bigint as pid,
|
||||
oi.sku,
|
||||
om.date,
|
||||
oi.price,
|
||||
oi.quantity,
|
||||
-- Discount = prorated order-level subtotal discount + item-level promo
|
||||
-- discounts, clamped so a sale line can never be discounted below free.
|
||||
(
|
||||
-- Prorated Points Discount (e.g. loyalty points applied at order level)
|
||||
CASE
|
||||
WHEN om.summary_discount_subtotal > 0 AND om.summary_subtotal > 0 THEN
|
||||
COALESCE(ROUND((om.summary_discount_subtotal * (oi.price * oi.quantity)) / NULLIF(om.summary_subtotal, 0), 4), 0)
|
||||
ELSE 0
|
||||
CASE WHEN oi.quantity > 0 THEN
|
||||
LEAST(
|
||||
(
|
||||
CASE
|
||||
WHEN om.summary_discount_subtotal > 0 AND om.summary_subtotal > 0 THEN
|
||||
COALESCE(ROUND((om.summary_discount_subtotal * (oi.price * oi.quantity)) / NULLIF(om.summary_subtotal, 0), 4), 0)
|
||||
ELSE 0
|
||||
END
|
||||
+ COALESCE(ot.promo_discount_sum, 0)
|
||||
),
|
||||
oi.price * oi.quantity
|
||||
)
|
||||
ELSE
|
||||
(
|
||||
CASE
|
||||
WHEN om.summary_discount_subtotal > 0 AND om.summary_subtotal > 0 THEN
|
||||
COALESCE(ROUND((om.summary_discount_subtotal * (oi.price * oi.quantity)) / NULLIF(om.summary_subtotal, 0), 4), 0)
|
||||
ELSE 0
|
||||
END
|
||||
+ COALESCE(ot.promo_discount_sum, 0)
|
||||
)
|
||||
END
|
||||
+
|
||||
-- Specific Item-Level Promo Discount (coupon codes, etc.)
|
||||
COALESCE(ot.promo_discount_sum, 0)
|
||||
)::NUMERIC(14, 4) as discount,
|
||||
COALESCE(ot.total_tax, 0)::NUMERIC(14, 4) as tax,
|
||||
false as tax_included,
|
||||
@@ -765,34 +719,83 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
|
||||
}
|
||||
}
|
||||
|
||||
// Start a transaction for updating sync status and dropping temp tables
|
||||
// Reconciliation 2 prep: fetch canceled (15) / combined (16) orders from MySQL
|
||||
// WITHOUT a date_placed filter — combine_orders zeroes date_placed on the source
|
||||
// orders, so the main item query can never re-fetch them. Done before opening
|
||||
// the PG transaction so we don't hold it across a MySQL round-trip.
|
||||
const [statusSweepRows] = await prodConnection.query(`
|
||||
SELECT order_id, order_status
|
||||
FROM _order
|
||||
WHERE order_status IN (15, 16)
|
||||
${incrementalUpdate ? 'AND stamp > ?' : ''}
|
||||
`, incrementalUpdate ? [mysqlSyncTime] : []);
|
||||
|
||||
let staleItemsDeleted = 0;
|
||||
let sweepUpdated = 0;
|
||||
|
||||
// Final transaction: reconcile deletions, sweep statuses, update sync status, drop temps
|
||||
await localConnection.beginTransaction();
|
||||
try {
|
||||
// Update sync status
|
||||
// Reconciliation 1: delete PG item rows that no longer exist in MySQL for the
|
||||
// orders fetched this run. temp_order_items holds the complete current item
|
||||
// set of every fetched order (staff edits and unpicked promo items DELETE
|
||||
// order_items rows in MySQL, which an upsert-only import never removes).
|
||||
const [reconcileResult] = await localConnection.query(`
|
||||
DELETE FROM orders o
|
||||
USING (SELECT DISTINCT order_id FROM temp_order_items) fetched
|
||||
WHERE o.order_number = fetched.order_id::text -- orders.order_number is TEXT
|
||||
AND NOT EXISTS (
|
||||
SELECT 1 FROM temp_order_items t
|
||||
WHERE t.order_id = fetched.order_id AND t.pid = o.pid
|
||||
)
|
||||
`);
|
||||
staleItemsDeleted = reconcileResult.rowCount || 0;
|
||||
|
||||
// Reconciliation 2: mark canceled/combined orders. 'combined' source orders were
|
||||
// merged into a new order that carries the same items — counting both would
|
||||
// double-count, so they also get canceled = true (routes filter on canceled).
|
||||
for (const [code, statusText] of [[15, 'canceled'], [16, 'combined']]) {
|
||||
const ids = statusSweepRows.filter(r => r.order_status === code).map(r => r.order_id);
|
||||
for (let i = 0; i < ids.length; i += 5000) {
|
||||
const chunk = ids.slice(i, i + 5000);
|
||||
const [sweepResult] = await localConnection.query(`
|
||||
UPDATE orders
|
||||
SET status = $1, canceled = true
|
||||
WHERE order_number = ANY($2::text[])
|
||||
AND (status IS DISTINCT FROM $1 OR canceled IS DISTINCT FROM true)
|
||||
`, [statusText, chunk.map(String)]);
|
||||
sweepUpdated += sweepResult.rowCount || 0;
|
||||
}
|
||||
}
|
||||
|
||||
// Update sync status with the watermark captured from MySQL BEFORE the
|
||||
// source queries ran (see sourceNow above).
|
||||
await localConnection.query(`
|
||||
INSERT INTO sync_status (table_name, last_sync_timestamp)
|
||||
VALUES ('orders', NOW())
|
||||
VALUES ('orders', $1)
|
||||
ON CONFLICT (table_name) DO UPDATE SET
|
||||
last_sync_timestamp = NOW()
|
||||
`);
|
||||
|
||||
last_sync_timestamp = $1
|
||||
`, [sourceNow]);
|
||||
|
||||
// Cleanup temporary tables
|
||||
await localConnection.query(`
|
||||
DROP TABLE IF EXISTS temp_order_items;
|
||||
DROP TABLE IF EXISTS temp_order_meta;
|
||||
DROP TABLE IF EXISTS temp_order_discounts;
|
||||
DROP TABLE IF EXISTS temp_order_taxes;
|
||||
DROP TABLE IF EXISTS temp_order_costs;
|
||||
DROP TABLE IF EXISTS temp_main_discounts;
|
||||
DROP TABLE IF EXISTS temp_item_discounts;
|
||||
DROP TABLE IF EXISTS temp_product_costs;
|
||||
`);
|
||||
|
||||
|
||||
// Commit final transaction
|
||||
await localConnection.commit();
|
||||
} catch (error) {
|
||||
await localConnection.rollback();
|
||||
throw error;
|
||||
throw error;
|
||||
}
|
||||
|
||||
if (staleItemsDeleted > 0 || sweepUpdated > 0) {
|
||||
console.log(`Orders: reconciliation removed ${staleItemsDeleted} stale item rows, swept ${sweepUpdated} canceled/combined rows`);
|
||||
}
|
||||
|
||||
return {
|
||||
@@ -800,6 +803,8 @@ async function importOrders(prodConnection, localConnection, incrementalUpdate =
|
||||
totalImported: Math.floor(importedCount) || 0,
|
||||
recordsAdded: parseInt(recordsAdded) || 0,
|
||||
recordsUpdated: parseInt(recordsUpdated) || 0,
|
||||
recordsDeleted: staleItemsDeleted,
|
||||
statusSweepUpdated: sweepUpdated,
|
||||
totalSkipped: skippedOrders.size || 0,
|
||||
missingProducts: missingProducts.size || 0,
|
||||
totalProcessed: orderItems.length, // Total order items in source
|
||||
|
||||
@@ -622,6 +622,7 @@ async function materializeCalculations(prodConnection, localConnection, incremen
|
||||
AND t.total_sold IS NOT DISTINCT FROM p.total_sold
|
||||
AND t.date_online IS NOT DISTINCT FROM p.date_online
|
||||
AND t.shop_score IS NOT DISTINCT FROM p.shop_score
|
||||
AND t.categories IS NOT DISTINCT FROM p.categories
|
||||
`);
|
||||
|
||||
// Get count of products that need updating
|
||||
@@ -662,6 +663,11 @@ async function importProducts(prodConnection, localConnection, incrementalUpdate
|
||||
}
|
||||
}
|
||||
|
||||
// Capture the next watermark from MySQL's own clock BEFORE querying any data.
|
||||
// Rows modified while the import runs stay above this watermark for the next
|
||||
// incremental run (overlap re-imports are harmless upserts).
|
||||
const [[{ source_now: sourceNow }]] = await prodConnection.query('SELECT NOW() as source_now');
|
||||
|
||||
// Start a transaction to ensure temporary tables persist
|
||||
await localConnection.beginTransaction();
|
||||
|
||||
@@ -927,16 +933,22 @@ async function importProducts(prodConnection, localConnection, incrementalUpdate
|
||||
// legacy PHP backend will stamp onto the PO line item.
|
||||
await syncSupplierCosts(prodConnection, localConnection);
|
||||
|
||||
// Sync category assignments for ALL products. product_category_index has no
|
||||
// stamp column, so category-only changes never bump any of the incremental
|
||||
// WHERE timestamps — without this pass PG categories go permanently stale.
|
||||
await syncProductCategories(prodConnection, localConnection);
|
||||
|
||||
// Commit the transaction
|
||||
await localConnection.commit();
|
||||
|
||||
// Update sync status
|
||||
// Update sync status with the watermark captured from MySQL BEFORE the
|
||||
// source queries ran (see sourceNow above).
|
||||
await localConnection.query(`
|
||||
INSERT INTO sync_status (table_name, last_sync_timestamp)
|
||||
VALUES ('products', NOW())
|
||||
VALUES ('products', $1)
|
||||
ON CONFLICT (table_name) DO UPDATE SET
|
||||
last_sync_timestamp = NOW()
|
||||
`);
|
||||
last_sync_timestamp = $1
|
||||
`, [sourceNow]);
|
||||
|
||||
return {
|
||||
status: 'complete',
|
||||
@@ -1028,11 +1040,126 @@ async function syncSupplierCosts(prodConnection, localConnection) {
|
||||
return { updated };
|
||||
}
|
||||
|
||||
// Full category-assignment sweep. The incremental product import keys on
|
||||
// p.stamp / ci.stamp / price / b2b dates — none of which change when a product
|
||||
// is recategorized in product_category_index (the table has no stamp column).
|
||||
// This pass compares the canonical GROUP_CONCAT representation against
|
||||
// products.categories and rewrites product_categories only for changed pids.
|
||||
// Must run inside the caller's transaction (uses ON COMMIT DROP temp table).
|
||||
async function syncProductCategories(prodConnection, localConnection) {
|
||||
outputProgress({
|
||||
status: "running",
|
||||
operation: "Products import",
|
||||
message: "Syncing category assignments"
|
||||
});
|
||||
|
||||
// Same expression as the main import query so representations compare equal
|
||||
// (GROUP_CONCAT(DISTINCT int) returns values numerically sorted).
|
||||
const [rows] = await prodConnection.query(`
|
||||
SELECT
|
||||
p.pid,
|
||||
GROUP_CONCAT(DISTINCT CASE
|
||||
WHEN pc.cat_id IS NOT NULL
|
||||
AND pc.type IN (10, 20, 11, 21, 12, 13)
|
||||
AND pci.cat_id NOT IN (16, 17)
|
||||
THEN pci.cat_id
|
||||
END) as category_ids
|
||||
FROM products p
|
||||
LEFT JOIN product_category_index pci ON p.pid = pci.pid
|
||||
LEFT JOIN product_categories pc ON pci.cat_id = pc.cat_id
|
||||
GROUP BY p.pid
|
||||
`);
|
||||
|
||||
if (!rows || rows.length === 0) {
|
||||
return { updated: 0 };
|
||||
}
|
||||
|
||||
await localConnection.query(`
|
||||
CREATE TEMP TABLE temp_category_sync (
|
||||
pid BIGINT PRIMARY KEY,
|
||||
categories TEXT
|
||||
) ON COMMIT DROP
|
||||
`);
|
||||
|
||||
const CHUNK = 5000;
|
||||
for (let i = 0; i < rows.length; i += CHUNK) {
|
||||
const batch = rows.slice(i, i + CHUNK);
|
||||
const pids = batch.map(r => r.pid);
|
||||
const cats = batch.map(r => r.category_ids);
|
||||
await localConnection.query(
|
||||
`INSERT INTO temp_category_sync (pid, categories)
|
||||
SELECT * FROM UNNEST($1::bigint[], $2::text[])
|
||||
ON CONFLICT (pid) DO NOTHING`,
|
||||
[pids, cats]
|
||||
);
|
||||
}
|
||||
|
||||
// Which existing products actually changed?
|
||||
const [changed] = await localConnection.query(`
|
||||
SELECT t.pid, t.categories
|
||||
FROM temp_category_sync t
|
||||
JOIN products p ON p.pid = t.pid
|
||||
WHERE t.categories IS DISTINCT FROM p.categories
|
||||
`);
|
||||
|
||||
if (changed.rows.length === 0) {
|
||||
return { updated: 0 };
|
||||
}
|
||||
|
||||
await localConnection.query(`
|
||||
UPDATE products p
|
||||
SET categories = t.categories
|
||||
FROM temp_category_sync t
|
||||
WHERE p.pid = t.pid
|
||||
AND t.categories IS DISTINCT FROM p.categories
|
||||
`);
|
||||
|
||||
// Rewrite the relationship rows for changed products only
|
||||
const REL_CHUNK = 1000;
|
||||
for (let i = 0; i < changed.rows.length; i += REL_CHUNK) {
|
||||
const batch = changed.rows.slice(i, i + REL_CHUNK);
|
||||
const pids = batch.map(r => r.pid);
|
||||
|
||||
await localConnection.query(
|
||||
'DELETE FROM product_categories WHERE pid = ANY($1)',
|
||||
[pids]
|
||||
);
|
||||
|
||||
const relPids = [];
|
||||
const relCats = [];
|
||||
for (const row of batch) {
|
||||
if (!row.categories) continue;
|
||||
for (const catId of row.categories.split(',')) {
|
||||
if (catId && catId.trim()) {
|
||||
relPids.push(row.pid);
|
||||
relCats.push(parseInt(catId.trim(), 10));
|
||||
}
|
||||
}
|
||||
}
|
||||
if (relPids.length > 0) {
|
||||
await localConnection.query(`
|
||||
INSERT INTO product_categories (pid, cat_id)
|
||||
SELECT * FROM UNNEST($1::bigint[], $2::int[])
|
||||
ON CONFLICT (pid, cat_id) DO NOTHING
|
||||
`, [relPids, relCats]);
|
||||
}
|
||||
}
|
||||
|
||||
outputProgress({
|
||||
status: "running",
|
||||
operation: "Products import",
|
||||
message: `Category assignments updated for ${changed.rows.length} products`
|
||||
});
|
||||
|
||||
return { updated: changed.rows.length };
|
||||
}
|
||||
|
||||
module.exports = {
|
||||
importProducts,
|
||||
importMissingProducts,
|
||||
setupTemporaryTables,
|
||||
cleanupTemporaryTables,
|
||||
materializeCalculations,
|
||||
syncSupplierCosts
|
||||
syncSupplierCosts,
|
||||
syncProductCategories
|
||||
};
|
||||
@@ -72,6 +72,11 @@ async function importPurchaseOrders(prodConnection, localConnection, incremental
|
||||
|
||||
console.log('Purchase Orders: Using last sync time:', lastSyncTime, '(adjusted:', mysqlSyncTime, ')');
|
||||
|
||||
// Capture the next watermark from MySQL's own clock BEFORE querying any data.
|
||||
// Rows modified while the import runs stay above this watermark for the next
|
||||
// incremental run (overlap re-imports are harmless upserts).
|
||||
const [[{ source_now: sourceNow }]] = await prodConnection.query('SELECT NOW() as source_now');
|
||||
|
||||
// Create temp tables for processing
|
||||
await localConnection.query(`
|
||||
DROP TABLE IF EXISTS temp_purchase_orders;
|
||||
@@ -267,13 +272,16 @@ async function importPurchaseOrders(prodConnection, localConnection, incremental
|
||||
if (totalPOs === 0) {
|
||||
console.log('No purchase orders to process, skipping PO import step');
|
||||
} else {
|
||||
// Fetch and process POs in batches
|
||||
let offset = 0;
|
||||
// Fetch and process POs in batches using keyset pagination on po_id.
|
||||
// LIMIT/OFFSET over a date_updated predicate silently skips rows when
|
||||
// concurrent updates shift rows between pages.
|
||||
let processedPOCount = 0;
|
||||
let lastPoId = 0;
|
||||
let allPOsProcessed = false;
|
||||
|
||||
|
||||
while (!allPOsProcessed) {
|
||||
const [poList] = await prodConnection.query(`
|
||||
SELECT
|
||||
SELECT
|
||||
p.po_id,
|
||||
p.supplier_id,
|
||||
s.companyname AS vendor,
|
||||
@@ -286,21 +294,23 @@ async function importPurchaseOrders(prodConnection, localConnection, incremental
|
||||
FROM po p
|
||||
LEFT JOIN suppliers s ON p.supplier_id = s.supplierid
|
||||
WHERE p.date_created >= DATE_SUB(CURRENT_DATE, INTERVAL ${yearInterval} YEAR)
|
||||
AND p.po_id > ?
|
||||
${incrementalUpdate ? `
|
||||
AND (
|
||||
p.date_updated > ?
|
||||
OR p.date_ordered > ?
|
||||
p.date_updated > ?
|
||||
OR p.date_ordered > ?
|
||||
OR p.date_estin > ?
|
||||
)
|
||||
` : ''}
|
||||
ORDER BY p.po_id
|
||||
LIMIT ${PO_BATCH_SIZE} OFFSET ${offset}
|
||||
`, incrementalUpdate ? [mysqlSyncTime, mysqlSyncTime, mysqlSyncTime] : []);
|
||||
|
||||
LIMIT ${PO_BATCH_SIZE}
|
||||
`, incrementalUpdate ? [lastPoId, mysqlSyncTime, mysqlSyncTime, mysqlSyncTime] : [lastPoId]);
|
||||
|
||||
if (poList.length === 0) {
|
||||
allPOsProcessed = true;
|
||||
break;
|
||||
}
|
||||
lastPoId = poList[poList.length - 1].po_id;
|
||||
|
||||
// Get products for these POs
|
||||
const poIds = poList.map(po => po.po_id);
|
||||
@@ -332,7 +342,11 @@ async function importPurchaseOrders(prodConnection, localConnection, incremental
|
||||
vendor: po.vendor || 'Unknown Vendor',
|
||||
date: validateDate(po.date_ordered) || validateDate(po.date_created),
|
||||
expected_date: validateDate(po.date_estin),
|
||||
status: poStatusMap[po.status] || 'created',
|
||||
// Unknown codes get a sentinel rather than 'created': defaulting an
|
||||
// unknown cancel-like code to an OPEN status would inflate on-order
|
||||
// FIFO (the metrics CTEs whitelist known-open statuses, so a sentinel
|
||||
// is simply ignored there).
|
||||
status: poStatusMap[po.status] || `unknown_${po.status}`,
|
||||
notes: po.notes || '',
|
||||
long_note: po.long_note || '',
|
||||
ordered: product.qty_each,
|
||||
@@ -393,20 +407,20 @@ async function importPurchaseOrders(prodConnection, localConnection, incremental
|
||||
`, values);
|
||||
}
|
||||
|
||||
offset += poList.length;
|
||||
processedPOCount += poList.length;
|
||||
totalProcessed += completePOs.length;
|
||||
|
||||
|
||||
outputProgress({
|
||||
status: "running",
|
||||
operation: "Purchase orders import",
|
||||
message: `Processed ${offset} of ${totalPOs} purchase orders (${totalProcessed} line items)`,
|
||||
current: offset,
|
||||
message: `Processed ${processedPOCount} of ${totalPOs} purchase orders (${totalProcessed} line items)`,
|
||||
current: processedPOCount,
|
||||
total: totalPOs,
|
||||
elapsed: formatElapsedTime(startTime),
|
||||
remaining: estimateRemaining(startTime, offset, totalPOs),
|
||||
rate: calculateRate(startTime, offset)
|
||||
remaining: estimateRemaining(startTime, processedPOCount, totalPOs),
|
||||
rate: calculateRate(startTime, processedPOCount)
|
||||
});
|
||||
|
||||
|
||||
if (poList.length < PO_BATCH_SIZE) {
|
||||
allPOsProcessed = true;
|
||||
}
|
||||
@@ -439,13 +453,14 @@ async function importPurchaseOrders(prodConnection, localConnection, incremental
|
||||
if (totalReceivings === 0) {
|
||||
console.log('No receivings to process, skipping receivings import step');
|
||||
} else {
|
||||
// Fetch and process receivings in batches
|
||||
offset = 0; // Reset offset for receivings
|
||||
// Fetch and process receivings in batches (keyset pagination, see POs above)
|
||||
let processedReceivingCount = 0;
|
||||
let lastReceivingId = 0;
|
||||
let allReceivingsProcessed = false;
|
||||
|
||||
|
||||
while (!allReceivingsProcessed) {
|
||||
const [receivingList] = await prodConnection.query(`
|
||||
SELECT
|
||||
SELECT
|
||||
r.receiving_id,
|
||||
r.supplier_id,
|
||||
r.status,
|
||||
@@ -459,6 +474,7 @@ async function importPurchaseOrders(prodConnection, localConnection, incremental
|
||||
r.date_checked
|
||||
FROM receivings r
|
||||
WHERE r.date_created >= DATE_SUB(CURRENT_DATE, INTERVAL ${yearInterval} YEAR)
|
||||
AND r.receiving_id > ?
|
||||
${incrementalUpdate ? `
|
||||
AND (
|
||||
r.date_updated > ?
|
||||
@@ -466,13 +482,14 @@ async function importPurchaseOrders(prodConnection, localConnection, incremental
|
||||
)
|
||||
` : ''}
|
||||
ORDER BY r.receiving_id
|
||||
LIMIT ${PO_BATCH_SIZE} OFFSET ${offset}
|
||||
`, incrementalUpdate ? [mysqlSyncTime, mysqlSyncTime] : []);
|
||||
|
||||
LIMIT ${PO_BATCH_SIZE}
|
||||
`, incrementalUpdate ? [lastReceivingId, mysqlSyncTime, mysqlSyncTime] : [lastReceivingId]);
|
||||
|
||||
if (receivingList.length === 0) {
|
||||
allReceivingsProcessed = true;
|
||||
break;
|
||||
}
|
||||
lastReceivingId = receivingList[receivingList.length - 1].receiving_id;
|
||||
|
||||
// Get products for these receivings
|
||||
const receivingIds = receivingList.map(r => r.receiving_id);
|
||||
@@ -545,7 +562,8 @@ async function importPurchaseOrders(prodConnection, localConnection, incremental
|
||||
received_date: validateDate(product.received_date) || validateDate(product.receiving_created_date),
|
||||
receiving_created_date: validateDate(product.receiving_created_date),
|
||||
supplier_id: receiving.supplier_id,
|
||||
status: receivingStatusMap[receiving.status] || 'created'
|
||||
// Sentinel for unknown codes — see PO status mapping note above
|
||||
status: receivingStatusMap[receiving.status] || `unknown_${receiving.status}`
|
||||
});
|
||||
}
|
||||
|
||||
@@ -600,18 +618,18 @@ async function importPurchaseOrders(prodConnection, localConnection, incremental
|
||||
`, values);
|
||||
}
|
||||
|
||||
offset += receivingList.length;
|
||||
processedReceivingCount += receivingList.length;
|
||||
totalProcessed += completeReceivings.length;
|
||||
|
||||
|
||||
outputProgress({
|
||||
status: "running",
|
||||
operation: "Purchase orders import",
|
||||
message: `Processed ${offset} of ${totalReceivings} receivings (${totalProcessed} line items total)`,
|
||||
current: offset,
|
||||
message: `Processed ${processedReceivingCount} of ${totalReceivings} receivings (${totalProcessed} line items total)`,
|
||||
current: processedReceivingCount,
|
||||
total: totalReceivings,
|
||||
elapsed: formatElapsedTime(startTime),
|
||||
remaining: estimateRemaining(startTime, offset, totalReceivings),
|
||||
rate: calculateRate(startTime, offset)
|
||||
remaining: estimateRemaining(startTime, processedReceivingCount, totalReceivings),
|
||||
rate: calculateRate(startTime, processedReceivingCount)
|
||||
});
|
||||
|
||||
if (receivingList.length < PO_BATCH_SIZE) {
|
||||
@@ -829,13 +847,14 @@ async function importPurchaseOrders(prodConnection, localConnection, incremental
|
||||
receivingRecordsAdded = receivingsResult.rows.filter(r => r.inserted).length;
|
||||
receivingRecordsUpdated = receivingsResult.rows.filter(r => !r.inserted).length;
|
||||
|
||||
// Update sync status
|
||||
// Update sync status with the watermark captured from MySQL BEFORE the
|
||||
// source queries ran (see sourceNow above).
|
||||
await localConnection.query(`
|
||||
INSERT INTO sync_status (table_name, last_sync_timestamp)
|
||||
VALUES ('purchase_orders', NOW())
|
||||
VALUES ('purchase_orders', $1)
|
||||
ON CONFLICT (table_name) DO UPDATE SET
|
||||
last_sync_timestamp = NOW()
|
||||
`);
|
||||
last_sync_timestamp = $1
|
||||
`, [sourceNow]);
|
||||
|
||||
// Clean up temporary tables
|
||||
await localConnection.query(`
|
||||
|
||||
@@ -151,7 +151,10 @@ async function importStockSnapshots(prodConnection, localConnection, incremental
|
||||
|
||||
recordsAdded += batch.length;
|
||||
} catch (err) {
|
||||
// Fail the step: the next incremental starts at MAX(snapshot_date), so a
|
||||
// swallowed batch error would leave a permanent hole that is never revisited.
|
||||
console.error(`Error inserting batch at offset ${i} (date range ending ${currentDate}):`, err.message);
|
||||
throw err;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -165,7 +168,7 @@ async function importStockSnapshots(prodConnection, localConnection, incremental
|
||||
current: processedRows,
|
||||
total: totalRows,
|
||||
elapsed: formatElapsedTime(startTime),
|
||||
rate: calculateRate(processedRows, startTime)
|
||||
rate: calculateRate(startTime, processedRows)
|
||||
});
|
||||
}
|
||||
|
||||
|
||||
@@ -10,7 +10,7 @@ DECLARE
|
||||
_date DATE;
|
||||
_count INT;
|
||||
_total_records INT := 0;
|
||||
_begin_date DATE := (SELECT MIN(date)::date FROM orders WHERE date >= '2020-01-01'); -- Starting point: captures all historical order data
|
||||
_begin_date DATE := (SELECT MIN((date AT TIME ZONE 'America/Chicago'))::date FROM orders WHERE date >= '2020-01-01'); -- Starting point: captures all historical order data (business days, Central time)
|
||||
_end_date DATE := CURRENT_DATE;
|
||||
BEGIN
|
||||
RAISE NOTICE 'Beginning daily snapshots rebuild from % to %. Starting at %', _begin_date, _end_date, _start_time;
|
||||
@@ -32,26 +32,34 @@ BEGIN
|
||||
p.sku,
|
||||
-- Count orders to ensure we only include products with real activity
|
||||
COUNT(o.id) as order_count,
|
||||
-- Aggregate Sales (Quantity > 0, Status not Canceled/Returned)
|
||||
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned') THEN o.quantity ELSE 0 END), 0) AS units_sold,
|
||||
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned') THEN o.price * o.quantity ELSE 0 END), 0.00) AS gross_revenue_unadjusted,
|
||||
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned') THEN o.discount ELSE 0 END), 0.00) AS discounts,
|
||||
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned') THEN
|
||||
-- Aggregate Sales (Quantity > 0, Status not Canceled/Returned/Combined)
|
||||
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned', 'combined') THEN o.quantity ELSE 0 END), 0) AS units_sold,
|
||||
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned', 'combined') THEN o.price * o.quantity ELSE 0 END), 0.00) AS gross_revenue_unadjusted,
|
||||
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned', 'combined') THEN o.discount ELSE 0 END), 0.00) AS discounts,
|
||||
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned', 'combined') THEN
|
||||
COALESCE(
|
||||
o.costeach,
|
||||
get_weighted_avg_cost(p.pid, o.date::date),
|
||||
get_weighted_avg_cost(p.pid, (o.date AT TIME ZONE 'America/Chicago')::date),
|
||||
p.cost_price
|
||||
) * o.quantity
|
||||
ELSE 0 END), 0.00) AS cogs,
|
||||
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned') THEN p.regular_price * o.quantity ELSE 0 END), 0.00) AS gross_regular_revenue,
|
||||
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned', 'combined') THEN p.regular_price * o.quantity ELSE 0 END), 0.00) AS gross_regular_revenue,
|
||||
|
||||
-- Aggregate Returns (Quantity < 0 or Status = Returned)
|
||||
COALESCE(SUM(CASE WHEN o.quantity < 0 OR COALESCE(o.status, 'pending') = 'returned' THEN ABS(o.quantity) ELSE 0 END), 0) AS units_returned,
|
||||
COALESCE(SUM(CASE WHEN o.quantity < 0 OR COALESCE(o.status, 'pending') = 'returned' THEN o.price * ABS(o.quantity) ELSE 0 END), 0.00) AS returns_revenue
|
||||
COALESCE(SUM(CASE WHEN o.quantity < 0 OR COALESCE(o.status, 'pending') = 'returned' THEN o.price * ABS(o.quantity) ELSE 0 END), 0.00) AS returns_revenue,
|
||||
-- Returns COGS: cost of returned goods offsets sales COGS
|
||||
COALESCE(SUM(CASE WHEN o.quantity < 0 OR COALESCE(o.status, 'pending') = 'returned' THEN
|
||||
COALESCE(
|
||||
o.costeach,
|
||||
get_weighted_avg_cost(p.pid, (o.date AT TIME ZONE 'America/Chicago')::date),
|
||||
p.cost_price
|
||||
) * ABS(o.quantity)
|
||||
ELSE 0 END), 0.00) AS returns_cogs
|
||||
FROM public.products p
|
||||
LEFT JOIN public.orders o
|
||||
ON p.pid = o.pid
|
||||
AND o.date::date = _date
|
||||
AND (o.date AT TIME ZONE 'America/Chicago')::date = _date -- business day (Central)
|
||||
GROUP BY p.pid, p.sku
|
||||
HAVING COUNT(o.id) > 0 -- Only include products with actual orders for this date
|
||||
),
|
||||
@@ -65,7 +73,7 @@ BEGIN
|
||||
-- Calculate received cost for this day
|
||||
SUM(r.qty_each * r.cost_each) AS cost_received
|
||||
FROM public.receivings r
|
||||
WHERE r.received_date::date = _date
|
||||
WHERE (r.received_date AT TIME ZONE 'America/Chicago')::date = _date
|
||||
GROUP BY r.pid
|
||||
HAVING COUNT(DISTINCT r.receiving_id) > 0 OR SUM(r.qty_each) > 0
|
||||
),
|
||||
@@ -120,9 +128,9 @@ BEGIN
|
||||
COALESCE(sd.discounts, 0.00),
|
||||
COALESCE(sd.returns_revenue, 0.00),
|
||||
COALESCE(sd.gross_revenue_unadjusted, 0.00) - COALESCE(sd.discounts, 0.00) - COALESCE(sd.returns_revenue, 0.00) AS net_revenue,
|
||||
COALESCE(sd.cogs, 0.00),
|
||||
COALESCE(sd.cogs, 0.00) - COALESCE(sd.returns_cogs, 0.00) AS cogs, -- net of returned goods' cost
|
||||
COALESCE(sd.gross_regular_revenue, 0.00),
|
||||
(COALESCE(sd.gross_revenue_unadjusted, 0.00) - COALESCE(sd.discounts, 0.00) - COALESCE(sd.returns_revenue, 0.00)) - COALESCE(sd.cogs, 0.00) AS profit,
|
||||
(COALESCE(sd.gross_revenue_unadjusted, 0.00) - COALESCE(sd.discounts, 0.00) - COALESCE(sd.returns_revenue, 0.00)) - (COALESCE(sd.cogs, 0.00) - COALESCE(sd.returns_cogs, 0.00)) AS profit,
|
||||
-- Receiving metrics
|
||||
COALESCE(rd.units_received, 0),
|
||||
COALESCE(rd.cost_received, 0.00),
|
||||
|
||||
@@ -123,7 +123,10 @@ BEGIN
|
||||
brand_metrics.current_stock_units IS DISTINCT FROM EXCLUDED.current_stock_units OR
|
||||
brand_metrics.sales_30d IS DISTINCT FROM EXCLUDED.sales_30d OR
|
||||
brand_metrics.revenue_30d IS DISTINCT FROM EXCLUDED.revenue_30d OR
|
||||
brand_metrics.lifetime_sales IS DISTINCT FROM EXCLUDED.lifetime_sales;
|
||||
brand_metrics.lifetime_sales IS DISTINCT FROM EXCLUDED.lifetime_sales OR
|
||||
-- Cost revisions can change profit/cogs with unchanged sales/revenue
|
||||
brand_metrics.profit_30d IS DISTINCT FROM EXCLUDED.profit_30d OR
|
||||
brand_metrics.cogs_30d IS DISTINCT FROM EXCLUDED.cogs_30d;
|
||||
|
||||
-- Update calculate_status
|
||||
INSERT INTO public.calculate_status (module_name, last_calculation_timestamp)
|
||||
|
||||
@@ -23,17 +23,19 @@ BEGIN
|
||||
SUM(pm.current_stock) AS current_stock_units,
|
||||
SUM(pm.current_stock_cost) AS current_stock_cost,
|
||||
SUM(pm.current_stock_retail) AS current_stock_retail,
|
||||
-- Sales metrics with proper filtering
|
||||
-- Sales metrics — revenue uses plain COALESCE (matching brand/vendor);
|
||||
-- a positive-only revenue filter while cogs/profit sum everything put
|
||||
-- the margin numerator and denominator on different row populations.
|
||||
SUM(CASE WHEN pm.sales_7d > 0 THEN pm.sales_7d ELSE 0 END) AS sales_7d,
|
||||
SUM(CASE WHEN pm.revenue_7d > 0 THEN pm.revenue_7d ELSE 0 END) AS revenue_7d,
|
||||
SUM(COALESCE(pm.revenue_7d, 0)) AS revenue_7d,
|
||||
SUM(CASE WHEN pm.sales_30d > 0 THEN pm.sales_30d ELSE 0 END) AS sales_30d,
|
||||
SUM(CASE WHEN pm.revenue_30d > 0 THEN pm.revenue_30d ELSE 0 END) AS revenue_30d,
|
||||
SUM(COALESCE(pm.revenue_30d, 0)) AS revenue_30d,
|
||||
SUM(COALESCE(pm.cogs_30d, 0)) AS cogs_30d,
|
||||
SUM(COALESCE(pm.profit_30d, 0)) AS profit_30d,
|
||||
SUM(CASE WHEN pm.sales_365d > 0 THEN pm.sales_365d ELSE 0 END) AS sales_365d,
|
||||
SUM(CASE WHEN pm.revenue_365d > 0 THEN pm.revenue_365d ELSE 0 END) AS revenue_365d,
|
||||
SUM(COALESCE(pm.revenue_365d, 0)) AS revenue_365d,
|
||||
SUM(CASE WHEN pm.lifetime_sales > 0 THEN pm.lifetime_sales ELSE 0 END) AS lifetime_sales,
|
||||
SUM(CASE WHEN pm.lifetime_revenue > 0 THEN pm.lifetime_revenue ELSE 0 END) AS lifetime_revenue
|
||||
SUM(COALESCE(pm.lifetime_revenue, 0)) AS lifetime_revenue
|
||||
FROM public.product_categories pc
|
||||
JOIN public.product_metrics pm ON pc.pid = pm.pid
|
||||
GROUP BY pc.cat_id
|
||||
@@ -62,15 +64,15 @@ BEGIN
|
||||
SUM(pm.current_stock_cost) AS current_stock_cost,
|
||||
SUM(pm.current_stock_retail) AS current_stock_retail,
|
||||
SUM(CASE WHEN pm.sales_7d > 0 THEN pm.sales_7d ELSE 0 END) AS sales_7d,
|
||||
SUM(CASE WHEN pm.revenue_7d > 0 THEN pm.revenue_7d ELSE 0 END) AS revenue_7d,
|
||||
SUM(COALESCE(pm.revenue_7d, 0)) AS revenue_7d,
|
||||
SUM(CASE WHEN pm.sales_30d > 0 THEN pm.sales_30d ELSE 0 END) AS sales_30d,
|
||||
SUM(CASE WHEN pm.revenue_30d > 0 THEN pm.revenue_30d ELSE 0 END) AS revenue_30d,
|
||||
SUM(COALESCE(pm.revenue_30d, 0)) AS revenue_30d,
|
||||
SUM(COALESCE(pm.cogs_30d, 0)) AS cogs_30d,
|
||||
SUM(COALESCE(pm.profit_30d, 0)) AS profit_30d,
|
||||
SUM(CASE WHEN pm.sales_365d > 0 THEN pm.sales_365d ELSE 0 END) AS sales_365d,
|
||||
SUM(CASE WHEN pm.revenue_365d > 0 THEN pm.revenue_365d ELSE 0 END) AS revenue_365d,
|
||||
SUM(COALESCE(pm.revenue_365d, 0)) AS revenue_365d,
|
||||
SUM(CASE WHEN pm.lifetime_sales > 0 THEN pm.lifetime_sales ELSE 0 END) AS lifetime_sales,
|
||||
SUM(CASE WHEN pm.lifetime_revenue > 0 THEN pm.lifetime_revenue ELSE 0 END) AS lifetime_revenue
|
||||
SUM(COALESCE(pm.lifetime_revenue, 0)) AS lifetime_revenue
|
||||
FROM CategoryProducts cp
|
||||
JOIN public.product_metrics pm ON cp.pid = pm.pid
|
||||
GROUP BY cp.ancestor_cat_id
|
||||
@@ -200,7 +202,10 @@ BEGIN
|
||||
category_metrics.revenue_30d IS DISTINCT FROM EXCLUDED.revenue_30d OR
|
||||
category_metrics.lifetime_sales IS DISTINCT FROM EXCLUDED.lifetime_sales OR
|
||||
category_metrics.direct_product_count IS DISTINCT FROM EXCLUDED.direct_product_count OR
|
||||
category_metrics.direct_sales_30d IS DISTINCT FROM EXCLUDED.direct_sales_30d;
|
||||
category_metrics.direct_sales_30d IS DISTINCT FROM EXCLUDED.direct_sales_30d OR
|
||||
-- Cost revisions can change profit/cogs with unchanged sales/revenue
|
||||
category_metrics.profit_30d IS DISTINCT FROM EXCLUDED.profit_30d OR
|
||||
category_metrics.cogs_30d IS DISTINCT FROM EXCLUDED.cogs_30d;
|
||||
|
||||
-- Update calculate_status
|
||||
INSERT INTO public.calculate_status (module_name, last_calculation_timestamp)
|
||||
|
||||
@@ -60,26 +60,31 @@ BEGIN
|
||||
GROUP BY p.vendor
|
||||
),
|
||||
VendorPOAggregates AS (
|
||||
-- Aggregate PO related stats including lead time calculated from POs to receivings
|
||||
-- Lead time per PO line = days to its FIRST receiving from the same supplier
|
||||
-- (within 180 days), then averaged per vendor. Joining each PO line to EVERY
|
||||
-- later receiving overstated lead time and weighted it toward busy products.
|
||||
-- Same shape as the per-product calc in update_periodic_metrics.sql.
|
||||
SELECT
|
||||
po.vendor,
|
||||
COUNT(DISTINCT po.po_id) AS po_count_365d,
|
||||
-- Calculate lead time by averaging the days between PO date and receiving date
|
||||
AVG(GREATEST(1, CASE
|
||||
WHEN r.received_date IS NOT NULL AND po.date IS NOT NULL
|
||||
THEN (r.received_date::date - po.date::date)
|
||||
ELSE NULL
|
||||
END))::int AS avg_lead_time_days_hist -- Avg lead time from HISTORICAL received POs
|
||||
FROM public.purchase_orders po
|
||||
-- Join to receivings table to find when items were received
|
||||
LEFT JOIN public.receivings r ON r.pid = po.pid AND r.supplier_id = po.supplier_id
|
||||
WHERE po.vendor IS NOT NULL AND po.vendor <> ''
|
||||
AND po.date >= CURRENT_DATE - INTERVAL '1 year' -- Look at POs created in the last year
|
||||
AND po.status = 'done' -- Only calculate lead time on completed POs
|
||||
AND r.received_date IS NOT NULL
|
||||
AND po.date IS NOT NULL
|
||||
AND r.received_date >= po.date
|
||||
GROUP BY po.vendor
|
||||
vendor,
|
||||
COUNT(DISTINCT po_id) AS po_count_365d,
|
||||
ROUND(AVG(GREATEST(1, first_receive_date - po_date)))::int AS avg_lead_time_days_hist
|
||||
FROM (
|
||||
SELECT
|
||||
po.vendor,
|
||||
po.po_id,
|
||||
po.pid,
|
||||
po.date::date AS po_date,
|
||||
MIN(r.received_date::date) AS first_receive_date
|
||||
FROM public.purchase_orders po
|
||||
JOIN public.receivings r ON r.pid = po.pid AND r.supplier_id = po.supplier_id
|
||||
AND r.received_date >= po.date
|
||||
AND r.received_date <= po.date + INTERVAL '180 days'
|
||||
WHERE po.status = 'done'
|
||||
AND po.date >= CURRENT_DATE - INTERVAL '1 year'
|
||||
AND po.vendor IS NOT NULL AND po.vendor <> ''
|
||||
GROUP BY po.vendor, po.po_id, po.pid, po.date
|
||||
) po_first_receiving
|
||||
GROUP BY vendor
|
||||
),
|
||||
AllVendors AS (
|
||||
-- Ensure all vendors from products table are included
|
||||
@@ -154,7 +159,11 @@ BEGIN
|
||||
vendor_metrics.on_order_units IS DISTINCT FROM EXCLUDED.on_order_units OR
|
||||
vendor_metrics.sales_30d IS DISTINCT FROM EXCLUDED.sales_30d OR
|
||||
vendor_metrics.revenue_30d IS DISTINCT FROM EXCLUDED.revenue_30d OR
|
||||
vendor_metrics.lifetime_sales IS DISTINCT FROM EXCLUDED.lifetime_sales;
|
||||
vendor_metrics.lifetime_sales IS DISTINCT FROM EXCLUDED.lifetime_sales OR
|
||||
-- Cost revisions can change profit/cogs with unchanged sales/revenue
|
||||
vendor_metrics.profit_30d IS DISTINCT FROM EXCLUDED.profit_30d OR
|
||||
vendor_metrics.cogs_30d IS DISTINCT FROM EXCLUDED.cogs_30d OR
|
||||
vendor_metrics.avg_lead_time_days IS DISTINCT FROM EXCLUDED.avg_lead_time_days;
|
||||
|
||||
-- Update calculate_status
|
||||
INSERT INTO public.calculate_status (module_name, last_calculation_timestamp)
|
||||
|
||||
+69
@@ -0,0 +1,69 @@
|
||||
-- Migration 003: Item-level promo discounts + business-day (America/Chicago) bucketing
|
||||
-- (applied 2026-06-11, together with the IMPORT_METRICS_FIX_PLAN.md batch)
|
||||
--
|
||||
-- PROBLEM 1 — dropped item-level promo discounts (~$26K / 30 days):
|
||||
-- orders.js applied item-level discounts from order_discount_items only when the
|
||||
-- parent order_discounts row had discount_amount_subtotal > 0:
|
||||
-- SUM(CASE WHEN COALESCE(md.discount_amount_subtotal, 0) > 0 THEN id.amount ELSE 0 END)
|
||||
-- In the PHP source, item-level promo discounts (which = 2) are applied to the order
|
||||
-- total SEPARATELY from summary_discount_subtotal, so the gate zeroed essentially all
|
||||
-- of them (90d live check: of 10,010 type-10 promos, 8,070 had item rows but only 8 had
|
||||
-- discount_amount_subtotal > 0). Net effect: orders.discount understated, net_revenue /
|
||||
-- profit_30d / margin_30d overstated by ~10% of revenue, discounts_30d ~3x understated.
|
||||
--
|
||||
-- FIX (orders.js): fetch only order_discount_items rows with which = 2 (which = 1 rows
|
||||
-- are prices of free promo-added items, which = 3 are usage records), sum them
|
||||
-- unconditionally, and clamp each sale line's total discount to price * quantity.
|
||||
-- temp_main_discounts / temp_order_discounts staging removed (unused after the fix).
|
||||
--
|
||||
-- PROBLEM 2 — Europe/Berlin day bucketing:
|
||||
-- orders.date is timestamptz and the PG server timezone is Europe/Berlin, so ::date
|
||||
-- casts shifted every order placed after ~5 PM Central onto the NEXT calendar day in
|
||||
-- daily_product_snapshots (and skewed yesterday_sales, DOW patterns, forecast accuracy).
|
||||
--
|
||||
-- FIX (update_daily_snapshots.sql, backfill/rebuild_daily_snapshots.sql,
|
||||
-- update_product_metrics.sql): every day-bucketing cast is now
|
||||
-- (ts AT TIME ZONE 'America/Chicago')::date
|
||||
-- Supporting expression indexes:
|
||||
-- CREATE INDEX idx_orders_date_chicago ON orders (((date AT TIME ZONE 'America/Chicago')::date));
|
||||
-- CREATE INDEX idx_receivings_received_chicago ON receivings (((received_date AT TIME ZONE 'America/Chicago')::date));
|
||||
--
|
||||
-- ALSO IN THIS BATCH (same re-import/rebuild):
|
||||
-- * 'combined' order status (code 16) excluded from all sales aggregates, and a sweep
|
||||
-- in orders.js marks canceled/combined source orders (canceled = true) even though
|
||||
-- combine_orders zeroes date_placed (Fixes 4/5).
|
||||
-- * Returns now subtract COGS (returns_cogs) in daily snapshots (Fix 8).
|
||||
-- * return_rate_30d = returns / sales (Fix 9); gmroi_30d annualized ×12.17 (Fix 10).
|
||||
-- * stockout/avg-stock/service-level derived from stock_snapshots presence (Fix 7).
|
||||
--
|
||||
-- REQUIRED ACTION (cannot be fixed by SQL alone — discount values are baked into rows):
|
||||
-- 1. Deploy updated orders.js + snapshot SQL files.
|
||||
-- 2. Pause the recurring import: touch inventory-server/.pause-auto-update
|
||||
-- 3. FULL orders re-import: INCREMENTAL_UPDATE=false node scripts/import-from-prod.js
|
||||
-- 4. Rebuild snapshots: psql -f scripts/metrics-new/backfill/rebuild_daily_snapshots.sql
|
||||
-- 5. Recalculate metrics: node scripts/calculate-metrics-new.js
|
||||
-- 6. Resume: rm inventory-server/.pause-auto-update
|
||||
--
|
||||
-- EXPECTED AFTER RE-IMPORT: margin_30d down ~8-10 points (real, not a data incident),
|
||||
-- discounts_30d ~3x up, daily sales curves shifted onto correct business days.
|
||||
--
|
||||
-- VERIFICATION:
|
||||
-- (a) PG SUM(discount) over a 30-day window should approximate MySQL
|
||||
-- Σ summary_discount_subtotal (prorated) + Σ order_discount_items.amount (which=2)
|
||||
-- over the same orders.
|
||||
-- (b) Per-day units in daily_product_snapshots should match MySQL
|
||||
-- SELECT date_placed_onlydate, SUM(qty_ordered) FROM order_items JOIN _order ...
|
||||
-- WHERE order_status >= 20 GROUP BY 1 (MySQL stores Central days).
|
||||
-- (c) Migration 002 regression check (discount double-counting) still holds:
|
||||
SELECT
|
||||
o.pid,
|
||||
o.order_number,
|
||||
o.price,
|
||||
o.quantity,
|
||||
o.discount,
|
||||
(o.price * o.quantity - o.discount) as net_revenue
|
||||
FROM orders o
|
||||
WHERE o.pid IN (624756, 614513)
|
||||
ORDER BY o.date DESC
|
||||
LIMIT 10;
|
||||
-- Expected: discount 0 (or genuine promo amount) for regular sales; net close to gross.
|
||||
@@ -0,0 +1,9 @@
|
||||
-- Migration 004: Map order status codes 45 and 67 to text
|
||||
--
|
||||
-- Follow-up to 001_map_order_statuses.sql: the orders.js orderStatusMap lacked
|
||||
-- codes 45 (payment_pending) and 67 (remote_send), so any such orders imported
|
||||
-- as numeric strings '45' / '67'. orders.js now maps them; this updates any
|
||||
-- existing rows (a full re-import also fixes them — safe to run either way).
|
||||
|
||||
UPDATE orders SET status = 'payment_pending' WHERE status = '45';
|
||||
UPDATE orders SET status = 'remote_send' WHERE status = '67';
|
||||
@@ -39,50 +39,68 @@ BEGIN
|
||||
-- 2. Stale detection: existing snapshots where aggregates don't match source data
|
||||
-- (catches backfilled imports that arrived after snapshot was calculated)
|
||||
-- 3. Recent recheck: last N days always reprocessed (picks up new orders, corrections)
|
||||
-- NOTE: all order/receiving timestamps are bucketed into business days using
|
||||
-- America/Chicago. The PG server timezone is Europe/Berlin, so a bare ::date
|
||||
-- cast would shift every evening order onto the next day.
|
||||
FOR _target_date IN
|
||||
SELECT d FROM (
|
||||
-- Gap fill: find dates with activity but missing snapshots
|
||||
SELECT activity_dates.d
|
||||
FROM (
|
||||
SELECT DISTINCT date::date AS d FROM public.orders
|
||||
WHERE date::date >= _backfill_start AND date::date < CURRENT_DATE - _recent_recheck_days
|
||||
SELECT DISTINCT (date AT TIME ZONE 'America/Chicago')::date AS d FROM public.orders
|
||||
WHERE (date AT TIME ZONE 'America/Chicago')::date >= _backfill_start
|
||||
AND (date AT TIME ZONE 'America/Chicago')::date < CURRENT_DATE - _recent_recheck_days
|
||||
UNION
|
||||
SELECT DISTINCT received_date::date AS d FROM public.receivings
|
||||
WHERE received_date::date >= _backfill_start AND received_date::date < CURRENT_DATE - _recent_recheck_days
|
||||
SELECT DISTINCT (received_date AT TIME ZONE 'America/Chicago')::date AS d FROM public.receivings
|
||||
WHERE (received_date AT TIME ZONE 'America/Chicago')::date >= _backfill_start
|
||||
AND (received_date AT TIME ZONE 'America/Chicago')::date < CURRENT_DATE - _recent_recheck_days
|
||||
) activity_dates
|
||||
WHERE NOT EXISTS (
|
||||
SELECT 1 FROM public.daily_product_snapshots dps WHERE dps.snapshot_date = activity_dates.d
|
||||
)
|
||||
UNION
|
||||
-- Stale detection: compare snapshot aggregates against source tables
|
||||
-- (must bucket identically to SalesData/ReceivingData or every day
|
||||
-- looks permanently stale)
|
||||
SELECT snap_agg.snapshot_date AS d
|
||||
FROM (
|
||||
SELECT snapshot_date,
|
||||
COALESCE(SUM(units_received), 0)::bigint AS snap_received,
|
||||
COALESCE(SUM(units_sold), 0)::bigint AS snap_sold
|
||||
COALESCE(SUM(units_sold), 0)::bigint AS snap_sold,
|
||||
ROUND(COALESCE(SUM(net_revenue), 0), 2) AS snap_net_revenue
|
||||
FROM public.daily_product_snapshots
|
||||
WHERE snapshot_date >= _backfill_start
|
||||
AND snapshot_date < CURRENT_DATE - _recent_recheck_days
|
||||
GROUP BY snapshot_date
|
||||
) snap_agg
|
||||
LEFT JOIN (
|
||||
SELECT received_date::date AS d, SUM(qty_each)::bigint AS actual_received
|
||||
SELECT (received_date AT TIME ZONE 'America/Chicago')::date AS d, SUM(qty_each)::bigint AS actual_received
|
||||
FROM public.receivings
|
||||
WHERE received_date::date >= _backfill_start
|
||||
AND received_date::date < CURRENT_DATE - _recent_recheck_days
|
||||
GROUP BY received_date::date
|
||||
WHERE (received_date AT TIME ZONE 'America/Chicago')::date >= _backfill_start
|
||||
AND (received_date AT TIME ZONE 'America/Chicago')::date < CURRENT_DATE - _recent_recheck_days
|
||||
GROUP BY 1
|
||||
) recv_agg ON snap_agg.snapshot_date = recv_agg.d
|
||||
LEFT JOIN (
|
||||
SELECT date::date AS d,
|
||||
SUM(CASE WHEN quantity > 0 AND COALESCE(status, 'pending') NOT IN ('canceled', 'returned')
|
||||
THEN quantity ELSE 0 END)::bigint AS actual_sold
|
||||
SELECT (date AT TIME ZONE 'America/Chicago')::date AS d,
|
||||
SUM(CASE WHEN quantity > 0 AND COALESCE(status, 'pending') NOT IN ('canceled', 'returned', 'combined')
|
||||
THEN quantity ELSE 0 END)::bigint AS actual_sold,
|
||||
-- Mirrors SalesData's net_revenue (gross - discounts - returns)
|
||||
-- so price/discount corrections older than the recheck window
|
||||
-- get repaired, not just unit-count changes.
|
||||
ROUND(
|
||||
SUM(CASE WHEN quantity > 0 AND COALESCE(status, 'pending') NOT IN ('canceled', 'returned', 'combined')
|
||||
THEN price * quantity - discount ELSE 0 END)
|
||||
- SUM(CASE WHEN quantity < 0 OR COALESCE(status, 'pending') = 'returned'
|
||||
THEN price * ABS(quantity) ELSE 0 END)
|
||||
, 2) AS actual_net_revenue
|
||||
FROM public.orders
|
||||
WHERE date::date >= _backfill_start
|
||||
AND date::date < CURRENT_DATE - _recent_recheck_days
|
||||
GROUP BY date::date
|
||||
WHERE (date AT TIME ZONE 'America/Chicago')::date >= _backfill_start
|
||||
AND (date AT TIME ZONE 'America/Chicago')::date < CURRENT_DATE - _recent_recheck_days
|
||||
GROUP BY 1
|
||||
) orders_agg ON snap_agg.snapshot_date = orders_agg.d
|
||||
WHERE snap_agg.snap_received != COALESCE(recv_agg.actual_received, 0)
|
||||
OR snap_agg.snap_sold != COALESCE(orders_agg.actual_sold, 0)
|
||||
OR snap_agg.snap_net_revenue != ROUND(COALESCE(orders_agg.actual_net_revenue, 0), 2)
|
||||
UNION
|
||||
-- Recent days: always reprocess
|
||||
SELECT d::date
|
||||
@@ -116,26 +134,36 @@ BEGIN
|
||||
p.sku,
|
||||
-- Track number of orders to ensure we have real data
|
||||
COUNT(o.id) as order_count,
|
||||
-- Aggregate Sales (Quantity > 0, Status not Canceled/Returned)
|
||||
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned') THEN o.quantity ELSE 0 END), 0) AS units_sold,
|
||||
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned') THEN o.price * o.quantity ELSE 0 END), 0.00) AS gross_revenue_unadjusted, -- Before discount
|
||||
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned') THEN o.discount ELSE 0 END), 0.00) AS discounts,
|
||||
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned') THEN
|
||||
-- Aggregate Sales (Quantity > 0, Status not Canceled/Returned/Combined)
|
||||
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned', 'combined') THEN o.quantity ELSE 0 END), 0) AS units_sold,
|
||||
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned', 'combined') THEN o.price * o.quantity ELSE 0 END), 0.00) AS gross_revenue_unadjusted, -- Before discount
|
||||
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned', 'combined') THEN o.discount ELSE 0 END), 0.00) AS discounts,
|
||||
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned', 'combined') THEN
|
||||
COALESCE(
|
||||
o.costeach, -- First use order-specific cost if available
|
||||
get_weighted_avg_cost(p.pid, o.date::date), -- Then use weighted average cost
|
||||
get_weighted_avg_cost(p.pid, (o.date AT TIME ZONE 'America/Chicago')::date), -- Then use weighted average cost
|
||||
p.cost_price -- Final fallback to current cost
|
||||
) * o.quantity
|
||||
) * o.quantity
|
||||
ELSE 0 END), 0.00) AS cogs,
|
||||
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned') THEN p.regular_price * o.quantity ELSE 0 END), 0.00) AS gross_regular_revenue, -- Use current regular price for simplicity here
|
||||
COALESCE(SUM(CASE WHEN o.quantity > 0 AND COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned', 'combined') THEN p.regular_price * o.quantity ELSE 0 END), 0.00) AS gross_regular_revenue, -- Use current regular price for simplicity here
|
||||
|
||||
-- Aggregate Returns (Quantity < 0 or Status = Returned)
|
||||
COALESCE(SUM(CASE WHEN o.quantity < 0 OR COALESCE(o.status, 'pending') = 'returned' THEN ABS(o.quantity) ELSE 0 END), 0) AS units_returned,
|
||||
COALESCE(SUM(CASE WHEN o.quantity < 0 OR COALESCE(o.status, 'pending') = 'returned' THEN o.price * ABS(o.quantity) ELSE 0 END), 0.00) AS returns_revenue
|
||||
COALESCE(SUM(CASE WHEN o.quantity < 0 OR COALESCE(o.status, 'pending') = 'returned' THEN o.price * ABS(o.quantity) ELSE 0 END), 0.00) AS returns_revenue,
|
||||
-- Returns COGS: returned goods come back into stock, so their cost
|
||||
-- offsets the sales COGS for the day (margin would otherwise be
|
||||
-- understated in return-heavy periods).
|
||||
COALESCE(SUM(CASE WHEN o.quantity < 0 OR COALESCE(o.status, 'pending') = 'returned' THEN
|
||||
COALESCE(
|
||||
o.costeach,
|
||||
get_weighted_avg_cost(p.pid, (o.date AT TIME ZONE 'America/Chicago')::date),
|
||||
p.cost_price
|
||||
) * ABS(o.quantity)
|
||||
ELSE 0 END), 0.00) AS returns_cogs
|
||||
FROM public.products p -- Start from products to include those with no orders today
|
||||
JOIN public.orders o -- Changed to INNER JOIN to only process products with orders
|
||||
ON p.pid = o.pid
|
||||
AND o.date::date = _target_date -- Cast to date to ensure compatibility regardless of original type
|
||||
AND (o.date AT TIME ZONE 'America/Chicago')::date = _target_date -- Bucket by business day (Central)
|
||||
GROUP BY p.pid, p.sku
|
||||
-- No HAVING clause here - we always want to include all orders
|
||||
),
|
||||
@@ -149,7 +177,7 @@ BEGIN
|
||||
-- Calculate the cost received (qty * cost)
|
||||
SUM(r.qty_each * r.cost_each) AS cost_received
|
||||
FROM public.receivings r
|
||||
WHERE r.received_date::date = _target_date
|
||||
WHERE (r.received_date AT TIME ZONE 'America/Chicago')::date = _target_date
|
||||
-- Optional: Filter out canceled receivings if needed
|
||||
-- AND r.status <> 'canceled'
|
||||
GROUP BY r.pid
|
||||
@@ -217,9 +245,9 @@ BEGIN
|
||||
COALESCE(sd.discounts, 0.00),
|
||||
COALESCE(sd.returns_revenue, 0.00),
|
||||
COALESCE(sd.gross_revenue_unadjusted, 0.00) - COALESCE(sd.discounts, 0.00) - COALESCE(sd.returns_revenue, 0.00) AS net_revenue,
|
||||
COALESCE(sd.cogs, 0.00),
|
||||
COALESCE(sd.cogs, 0.00) - COALESCE(sd.returns_cogs, 0.00) AS cogs, -- net of returned goods' cost
|
||||
COALESCE(sd.gross_regular_revenue, 0.00),
|
||||
(COALESCE(sd.gross_revenue_unadjusted, 0.00) - COALESCE(sd.discounts, 0.00) - COALESCE(sd.returns_revenue, 0.00)) - COALESCE(sd.cogs, 0.00) AS profit,
|
||||
(COALESCE(sd.gross_revenue_unadjusted, 0.00) - COALESCE(sd.discounts, 0.00) - COALESCE(sd.returns_revenue, 0.00)) - (COALESCE(sd.cogs, 0.00) - COALESCE(sd.returns_cogs, 0.00)) AS profit,
|
||||
-- Receiving Metrics (From ReceivingData)
|
||||
COALESCE(rd.units_received, 0),
|
||||
COALESCE(rd.cost_received, 0.00),
|
||||
|
||||
@@ -131,18 +131,19 @@ BEGIN
|
||||
HistoricalDates AS (
|
||||
-- Note: Calculating these MIN/MAX values hourly can be slow on large tables.
|
||||
-- Consider calculating periodically or storing on products if import can populate them.
|
||||
-- Dates are bucketed in business time (America/Chicago) to match daily snapshots.
|
||||
SELECT
|
||||
p.pid,
|
||||
MIN(o.date)::date AS date_first_sold,
|
||||
MAX(o.date)::date AS max_order_date, -- Use MAX for potential recalc of date_last_sold
|
||||
|
||||
MIN((o.date AT TIME ZONE 'America/Chicago'))::date AS date_first_sold,
|
||||
MAX((o.date AT TIME ZONE 'America/Chicago'))::date AS max_order_date, -- Use MAX for potential recalc of date_last_sold
|
||||
|
||||
-- For first received, use the new receivings table
|
||||
MIN(r.received_date)::date AS date_first_received_calc,
|
||||
|
||||
MIN((r.received_date AT TIME ZONE 'America/Chicago'))::date AS date_first_received_calc,
|
||||
|
||||
-- For last received, use the new receivings table
|
||||
MAX(r.received_date)::date AS date_last_received_calc
|
||||
MAX((r.received_date AT TIME ZONE 'America/Chicago'))::date AS date_last_received_calc
|
||||
FROM public.products p
|
||||
LEFT JOIN public.orders o ON p.pid = o.pid AND o.quantity > 0 AND o.status NOT IN ('canceled', 'returned')
|
||||
LEFT JOIN public.orders o ON p.pid = o.pid AND o.quantity > 0 AND o.status NOT IN ('canceled', 'returned', 'combined')
|
||||
LEFT JOIN public.receivings r ON p.pid = r.pid
|
||||
GROUP BY p.pid
|
||||
),
|
||||
@@ -174,17 +175,19 @@ BEGIN
|
||||
SUM(CASE WHEN snapshot_date >= _current_date - INTERVAL '29 days' AND snapshot_date <= _current_date THEN discounts ELSE 0 END) AS discounts_30d,
|
||||
SUM(CASE WHEN snapshot_date >= _current_date - INTERVAL '29 days' AND snapshot_date <= _current_date THEN gross_revenue ELSE 0 END) AS gross_revenue_30d,
|
||||
SUM(CASE WHEN snapshot_date >= _current_date - INTERVAL '29 days' AND snapshot_date <= _current_date THEN gross_regular_revenue ELSE 0 END) AS gross_regular_revenue_30d,
|
||||
SUM(CASE WHEN snapshot_date >= _current_date - INTERVAL '29 days' AND snapshot_date <= _current_date AND stockout_flag THEN 1 ELSE 0 END) AS stockout_days_30d,
|
||||
|
||||
-- NOTE: stockout days and avg stock units/cost now come from StockCoverage
|
||||
-- (stock_snapshots has full daily coverage; these activity-only snapshots
|
||||
-- only exist on days with sales/receivings, which made stockout_days ~0
|
||||
-- exactly when stockouts mattered and biased stock averages upward).
|
||||
|
||||
SUM(CASE WHEN snapshot_date >= _current_date - INTERVAL '364 days' AND snapshot_date <= _current_date THEN units_sold ELSE 0 END) AS sales_365d,
|
||||
SUM(CASE WHEN snapshot_date >= _current_date - INTERVAL '364 days' AND snapshot_date <= _current_date THEN net_revenue ELSE 0 END) AS revenue_365d,
|
||||
|
||||
SUM(CASE WHEN snapshot_date >= _current_date - INTERVAL '29 days' AND snapshot_date <= _current_date THEN units_received ELSE 0 END) AS received_qty_30d,
|
||||
SUM(CASE WHEN snapshot_date >= _current_date - INTERVAL '29 days' AND snapshot_date <= _current_date THEN cost_received ELSE 0 END) AS received_cost_30d,
|
||||
|
||||
-- Averages for stock levels - only include dates within the specified period
|
||||
AVG(CASE WHEN snapshot_date >= _current_date - INTERVAL '29 days' AND snapshot_date <= _current_date THEN eod_stock_quantity END) AS avg_stock_units_30d,
|
||||
AVG(CASE WHEN snapshot_date >= _current_date - INTERVAL '29 days' AND snapshot_date <= _current_date THEN eod_stock_cost END) AS avg_stock_cost_30d,
|
||||
-- Retail/gross stock averages stay on activity snapshots: stock_snapshots
|
||||
-- has no eod_stock_retail equivalent (cost-only source table).
|
||||
AVG(CASE WHEN snapshot_date >= _current_date - INTERVAL '29 days' AND snapshot_date <= _current_date THEN eod_stock_retail END) AS avg_stock_retail_30d,
|
||||
AVG(CASE WHEN snapshot_date >= _current_date - INTERVAL '29 days' AND snapshot_date <= _current_date THEN eod_stock_gross END) AS avg_stock_gross_30d,
|
||||
|
||||
@@ -240,16 +243,89 @@ BEGIN
|
||||
LEFT JOIN public.settings_vendor sv ON p.vendor = sv.vendor
|
||||
),
|
||||
LifetimeRevenue AS (
|
||||
-- Calculate actual revenue from orders table
|
||||
-- Calculate actual revenue from orders table. Negative-quantity rows
|
||||
-- (returns) are included so lifetime revenue nets out returns;
|
||||
-- price * quantity is already signed.
|
||||
SELECT
|
||||
o.pid,
|
||||
SUM(o.price * o.quantity - COALESCE(o.discount, 0)) AS lifetime_revenue_from_orders,
|
||||
SUM(o.quantity) AS lifetime_units_from_orders
|
||||
FROM public.orders o
|
||||
WHERE o.status NOT IN ('canceled', 'returned')
|
||||
AND o.quantity > 0
|
||||
WHERE o.status NOT IN ('canceled', 'returned', 'combined')
|
||||
GROUP BY o.pid
|
||||
),
|
||||
-- Full-coverage stock presence from stock_snapshots (MySQL snap_product_value).
|
||||
-- That source only writes rows for products WITH stock on hand, so a product
|
||||
-- missing from a day the cron ran was out of stock that day. Days before the
|
||||
-- product was created are not counted against it.
|
||||
StockCoverage AS (
|
||||
SELECT
|
||||
pid,
|
||||
eligible_days_30d,
|
||||
days_in_stock_30d,
|
||||
CASE WHEN eligible_days_30d > 0
|
||||
THEN GREATEST(0, eligible_days_30d - days_in_stock_30d)
|
||||
END AS stockout_days_30d,
|
||||
-- Absent days count as zero stock (the old activity-only average was
|
||||
-- biased toward in-stock days)
|
||||
CASE WHEN eligible_days_30d > 0
|
||||
THEN sum_qty::numeric / eligible_days_30d
|
||||
END AS avg_stock_units_30d,
|
||||
CASE WHEN eligible_days_30d > 0
|
||||
THEN sum_value::numeric / eligible_days_30d
|
||||
END AS avg_stock_cost_30d
|
||||
FROM (
|
||||
SELECT
|
||||
p.pid,
|
||||
LEAST(
|
||||
cal.covered_days,
|
||||
CASE WHEN p.created_at IS NULL THEN cal.covered_days
|
||||
ELSE GREATEST(0, (_current_date - GREATEST(p.created_at::date, _current_date - 29) + 1))
|
||||
END
|
||||
) AS eligible_days_30d,
|
||||
COALESCE(pres.days_in_stock, 0) AS days_in_stock_30d,
|
||||
COALESCE(pres.sum_qty, 0) AS sum_qty,
|
||||
COALESCE(pres.sum_value, 0) AS sum_value
|
||||
FROM public.products p
|
||||
CROSS JOIN (
|
||||
SELECT COUNT(DISTINCT snapshot_date) AS covered_days
|
||||
FROM public.stock_snapshots
|
||||
WHERE snapshot_date >= _current_date - INTERVAL '29 days'
|
||||
AND snapshot_date <= _current_date
|
||||
) cal
|
||||
LEFT JOIN (
|
||||
SELECT pid,
|
||||
COUNT(*) AS days_in_stock,
|
||||
SUM(stock_quantity) AS sum_qty,
|
||||
SUM(stock_value) AS sum_value
|
||||
FROM public.stock_snapshots
|
||||
WHERE snapshot_date >= _current_date - INTERVAL '29 days'
|
||||
AND snapshot_date <= _current_date
|
||||
GROUP BY pid
|
||||
) pres ON pres.pid = p.pid
|
||||
) base
|
||||
),
|
||||
-- Sales that happened on out-of-stock days (per the stock snapshot), for
|
||||
-- lost-sales incidents and the fill-rate heuristic. Restricted to days the
|
||||
-- stock cron actually ran so e.g. today's sales aren't misread as stockouts.
|
||||
SalesDayStock AS (
|
||||
SELECT
|
||||
dps.pid,
|
||||
SUM(dps.units_sold) AS units_sold_covered,
|
||||
COUNT(*) FILTER (WHERE dps.units_sold > 0 AND ss.pid IS NULL) AS lost_sales_incidents_30d,
|
||||
SUM(CASE WHEN ss.pid IS NULL THEN dps.units_sold ELSE 0 END) AS units_sold_on_stockout_days
|
||||
FROM public.daily_product_snapshots dps
|
||||
JOIN (
|
||||
SELECT DISTINCT snapshot_date FROM public.stock_snapshots
|
||||
WHERE snapshot_date >= _current_date - INTERVAL '29 days'
|
||||
AND snapshot_date <= _current_date
|
||||
) cal ON cal.snapshot_date = dps.snapshot_date
|
||||
LEFT JOIN public.stock_snapshots ss
|
||||
ON ss.pid = dps.pid AND ss.snapshot_date = dps.snapshot_date
|
||||
WHERE dps.snapshot_date >= _current_date - INTERVAL '29 days'
|
||||
AND dps.snapshot_date <= _current_date
|
||||
GROUP BY dps.pid
|
||||
),
|
||||
PreviousPeriodMetrics AS (
|
||||
-- Calculate metrics for previous 30-day period for growth comparison
|
||||
SELECT
|
||||
@@ -302,24 +378,43 @@ BEGIN
|
||||
GROUP BY pid
|
||||
),
|
||||
ServiceLevels AS (
|
||||
-- Calculate service level and fill rate metrics
|
||||
-- Service level and fill rate built on full-coverage stock data
|
||||
-- (StockCoverage / SalesDayStock) instead of activity-only snapshots.
|
||||
SELECT
|
||||
pid,
|
||||
COUNT(*) FILTER (WHERE stockout_flag = true) AS stockout_incidents_30d,
|
||||
COUNT(*) FILTER (WHERE stockout_flag = true AND units_sold > 0) AS lost_sales_incidents_30d,
|
||||
-- Service level: percentage of days without stockouts
|
||||
(1.0 - (COUNT(*) FILTER (WHERE stockout_flag = true)::NUMERIC / NULLIF(COUNT(*), 0))) * 100 AS service_level_30d,
|
||||
-- Fill rate: units sold / (units sold + potential lost sales)
|
||||
CASE
|
||||
WHEN SUM(units_sold) > 0 THEN
|
||||
(SUM(units_sold)::NUMERIC /
|
||||
(SUM(units_sold) + SUM(CASE WHEN stockout_flag THEN units_sold * 0.2 ELSE 0 END))) * 100
|
||||
sc.pid,
|
||||
sc.stockout_days_30d AS stockout_incidents_30d,
|
||||
sds.lost_sales_incidents_30d,
|
||||
-- Service level: percentage of covered days the product was in stock
|
||||
CASE WHEN sc.eligible_days_30d > 0 THEN
|
||||
(1.0 - (sc.stockout_days_30d::NUMERIC / sc.eligible_days_30d)) * 100
|
||||
END AS service_level_30d,
|
||||
-- Fill rate: units sold / (units sold + potential lost sales).
|
||||
-- The 0.2 lost-sales factor is an arbitrary heuristic: each unit sold on
|
||||
-- an out-of-stock day is assumed to represent 20% additional missed demand.
|
||||
CASE
|
||||
WHEN COALESCE(sds.units_sold_covered, 0) > 0 THEN
|
||||
(sds.units_sold_covered::NUMERIC /
|
||||
(sds.units_sold_covered + COALESCE(sds.units_sold_on_stockout_days, 0) * 0.2)) * 100
|
||||
ELSE NULL
|
||||
END AS fill_rate_30d
|
||||
FROM public.daily_product_snapshots
|
||||
WHERE snapshot_date >= _current_date - INTERVAL '29 days'
|
||||
AND snapshot_date <= _current_date
|
||||
GROUP BY pid
|
||||
FROM StockCoverage sc
|
||||
LEFT JOIN SalesDayStock sds ON sds.pid = sc.pid
|
||||
),
|
||||
ProductVelocity AS (
|
||||
-- Single source for sales velocity so every replenishment/cover column stays
|
||||
-- consistent. NULL when the product is excluded from forecasting: excluded
|
||||
-- products now still get a product_metrics row (they used to be filtered out
|
||||
-- entirely and vanished from brand/vendor/category rollups), but their
|
||||
-- forecast-derived columns go NULL / zero.
|
||||
SELECT
|
||||
ci.pid,
|
||||
CASE WHEN COALESCE(s.exclude_forecast, FALSE) THEN NULL
|
||||
ELSE calculate_sales_velocity(sa.sales_30d::int, COALESCE(sc.stockout_days_30d, 0)::int)
|
||||
END AS daily
|
||||
FROM CurrentInfo ci
|
||||
LEFT JOIN SnapshotAggregates sa ON ci.pid = sa.pid
|
||||
LEFT JOIN StockCoverage sc ON ci.pid = sc.pid
|
||||
LEFT JOIN Settings s ON ci.pid = s.pid
|
||||
),
|
||||
SeasonalityAnalysis AS (
|
||||
-- Set-based seasonality detection (replaces per-product function calls)
|
||||
@@ -424,8 +519,8 @@ BEGIN
|
||||
END AS age_days,
|
||||
sa.sales_7d, sa.revenue_7d, sa.sales_14d, sa.revenue_14d, sa.sales_30d, sa.revenue_30d, sa.cogs_30d, sa.profit_30d,
|
||||
sa.returns_units_30d, sa.returns_revenue_30d, sa.discounts_30d, sa.gross_revenue_30d, sa.gross_regular_revenue_30d,
|
||||
sa.stockout_days_30d, sa.sales_365d, sa.revenue_365d,
|
||||
sa.avg_stock_units_30d, sa.avg_stock_cost_30d, sa.avg_stock_retail_30d, sa.avg_stock_gross_30d,
|
||||
sc.stockout_days_30d, sa.sales_365d, sa.revenue_365d,
|
||||
sc.avg_stock_units_30d, sc.avg_stock_cost_30d, sa.avg_stock_retail_30d, sa.avg_stock_gross_30d,
|
||||
sa.received_qty_30d, sa.received_cost_30d,
|
||||
-- Use total_sold from products table as the source of truth for lifetime sales
|
||||
-- This includes all historical data from the production database
|
||||
@@ -463,66 +558,68 @@ BEGIN
|
||||
sa.sales_30d AS avg_sales_per_month_30d, -- Using 30d sales as proxy for month
|
||||
(sa.profit_30d / NULLIF(sa.revenue_30d, 0)) * 100 AS margin_30d,
|
||||
(sa.profit_30d / NULLIF(sa.cogs_30d, 0)) * 100 AS markup_30d,
|
||||
sa.profit_30d / NULLIF(sa.avg_stock_cost_30d, 0) AS gmroi_30d,
|
||||
sa.sales_30d / NULLIF(sa.avg_stock_units_30d, 0) AS stockturn_30d,
|
||||
(sa.returns_units_30d / NULLIF(sa.sales_30d + sa.returns_units_30d, 0)) * 100 AS return_rate_30d,
|
||||
-- Annualized GMROI (30-day profit extrapolated to a year: × 365/30).
|
||||
-- Conventional benchmark for healthy retail is ≥ 2-3 on this scale.
|
||||
(sa.profit_30d / NULLIF(sc.avg_stock_cost_30d, 0)) * 12.17 AS gmroi_30d,
|
||||
sa.sales_30d / NULLIF(sc.avg_stock_units_30d, 0) AS stockturn_30d,
|
||||
-- Industry-standard definition: returns / sales (not returns / (sales+returns))
|
||||
(sa.returns_units_30d / NULLIF(sa.sales_30d, 0)) * 100 AS return_rate_30d,
|
||||
(sa.discounts_30d / NULLIF(sa.gross_revenue_30d, 0)) * 100 AS discount_rate_30d,
|
||||
(sa.stockout_days_30d / 30.0) * 100 AS stockout_rate_30d,
|
||||
(sc.stockout_days_30d::numeric / NULLIF(sc.eligible_days_30d, 0)) * 100 AS stockout_rate_30d,
|
||||
sa.gross_regular_revenue_30d - sa.gross_revenue_30d AS markdown_30d,
|
||||
((sa.gross_regular_revenue_30d - sa.gross_revenue_30d) / NULLIF(sa.gross_regular_revenue_30d, 0)) * 100 AS markdown_rate_30d,
|
||||
-- Sell-through rate: Industry standard is Units Sold / (Beginning Inventory + Units Received)
|
||||
-- Uses actual snapshot from 30 days ago as beginning stock, falls back to avg_stock_units_30d
|
||||
(sa.sales_30d / NULLIF(
|
||||
COALESCE(bs.beginning_stock_30d, sa.avg_stock_units_30d::int, 0) + sa.received_qty_30d,
|
||||
COALESCE(bs.beginning_stock_30d, sc.avg_stock_units_30d::int, 0) + sa.received_qty_30d,
|
||||
0
|
||||
)) * 100 AS sell_through_30d,
|
||||
|
||||
-- Forecasting intermediate values
|
||||
-- Use the calculate_sales_velocity function instead of repetitive calculation
|
||||
calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) AS sales_velocity_daily,
|
||||
-- Forecasting intermediate values (ProductVelocity; NULL when excluded from forecast)
|
||||
vel.daily AS sales_velocity_daily,
|
||||
s.effective_lead_time AS config_lead_time,
|
||||
s.effective_days_of_stock AS config_days_of_stock,
|
||||
s.effective_safety_stock AS config_safety_stock,
|
||||
(s.effective_lead_time + s.effective_days_of_stock) AS planning_period_days,
|
||||
|
||||
calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time AS lead_time_forecast_units,
|
||||
vel.daily * s.effective_lead_time AS lead_time_forecast_units,
|
||||
|
||||
calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_days_of_stock AS days_of_stock_forecast_units,
|
||||
vel.daily * s.effective_days_of_stock AS days_of_stock_forecast_units,
|
||||
|
||||
calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * (s.effective_lead_time + s.effective_days_of_stock) AS planning_period_forecast_units,
|
||||
vel.daily * (s.effective_lead_time + s.effective_days_of_stock) AS planning_period_forecast_units,
|
||||
|
||||
(ci.current_stock + COALESCE(ooi.on_order_qty, 0) - (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time)) AS lead_time_closing_stock,
|
||||
(ci.current_stock + COALESCE(ooi.on_order_qty, 0) - (vel.daily * s.effective_lead_time)) AS lead_time_closing_stock,
|
||||
|
||||
((ci.current_stock + COALESCE(ooi.on_order_qty, 0) - (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time))) - (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_days_of_stock) AS days_of_stock_closing_stock,
|
||||
((ci.current_stock + COALESCE(ooi.on_order_qty, 0) - (vel.daily * s.effective_lead_time))) - (vel.daily * s.effective_days_of_stock) AS days_of_stock_closing_stock,
|
||||
|
||||
((calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time) + (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_days_of_stock)) + s.effective_safety_stock - ci.current_stock - COALESCE(ooi.on_order_qty, 0) AS replenishment_needed_raw,
|
||||
((vel.daily * s.effective_lead_time) + (vel.daily * s.effective_days_of_stock)) + s.effective_safety_stock - ci.current_stock - COALESCE(ooi.on_order_qty, 0) AS replenishment_needed_raw,
|
||||
|
||||
-- Final Forecasting / Replenishment Metrics
|
||||
CEILING(GREATEST(0, (((calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time) + (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_days_of_stock)) + s.effective_safety_stock - ci.current_stock - COALESCE(ooi.on_order_qty, 0))))::int AS replenishment_units,
|
||||
(CEILING(GREATEST(0, (((calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time) + (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_days_of_stock)) + s.effective_safety_stock - ci.current_stock - COALESCE(ooi.on_order_qty, 0))))::int) * ci.current_effective_cost AS replenishment_cost,
|
||||
(CEILING(GREATEST(0, (((calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time) + (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_days_of_stock)) + s.effective_safety_stock - ci.current_stock - COALESCE(ooi.on_order_qty, 0))))::int) * ci.current_price AS replenishment_retail,
|
||||
(CEILING(GREATEST(0, (((calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time) + (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_days_of_stock)) + s.effective_safety_stock - ci.current_stock - COALESCE(ooi.on_order_qty, 0))))::int) * (ci.current_price - ci.current_effective_cost) AS replenishment_profit,
|
||||
CEILING(GREATEST(0, (((vel.daily * s.effective_lead_time) + (vel.daily * s.effective_days_of_stock)) + s.effective_safety_stock - ci.current_stock - COALESCE(ooi.on_order_qty, 0))))::int AS replenishment_units,
|
||||
(CEILING(GREATEST(0, (((vel.daily * s.effective_lead_time) + (vel.daily * s.effective_days_of_stock)) + s.effective_safety_stock - ci.current_stock - COALESCE(ooi.on_order_qty, 0))))::int) * ci.current_effective_cost AS replenishment_cost,
|
||||
(CEILING(GREATEST(0, (((vel.daily * s.effective_lead_time) + (vel.daily * s.effective_days_of_stock)) + s.effective_safety_stock - ci.current_stock - COALESCE(ooi.on_order_qty, 0))))::int) * ci.current_price AS replenishment_retail,
|
||||
(CEILING(GREATEST(0, (((vel.daily * s.effective_lead_time) + (vel.daily * s.effective_days_of_stock)) + s.effective_safety_stock - ci.current_stock - COALESCE(ooi.on_order_qty, 0))))::int) * (ci.current_price - ci.current_effective_cost) AS replenishment_profit,
|
||||
|
||||
-- To Order (Apply MOQ/UOM logic here if needed, otherwise equals replenishment)
|
||||
CEILING(GREATEST(0, (((calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time) + (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_days_of_stock)) + s.effective_safety_stock - ci.current_stock - COALESCE(ooi.on_order_qty, 0))))::int AS to_order_units,
|
||||
CEILING(GREATEST(0, (((vel.daily * s.effective_lead_time) + (vel.daily * s.effective_days_of_stock)) + s.effective_safety_stock - ci.current_stock - COALESCE(ooi.on_order_qty, 0))))::int AS to_order_units,
|
||||
|
||||
GREATEST(0, - (ci.current_stock + COALESCE(ooi.on_order_qty, 0) - (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time))) AS forecast_lost_sales_units,
|
||||
GREATEST(0, - (ci.current_stock + COALESCE(ooi.on_order_qty, 0) - (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time))) * ci.current_price AS forecast_lost_revenue,
|
||||
GREATEST(0, - (ci.current_stock + COALESCE(ooi.on_order_qty, 0) - (vel.daily * s.effective_lead_time))) AS forecast_lost_sales_units,
|
||||
GREATEST(0, - (ci.current_stock + COALESCE(ooi.on_order_qty, 0) - (vel.daily * s.effective_lead_time))) * ci.current_price AS forecast_lost_revenue,
|
||||
|
||||
ci.current_stock / NULLIF(calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int), 0) AS stock_cover_in_days,
|
||||
COALESCE(ooi.on_order_qty, 0) / NULLIF(calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int), 0) AS po_cover_in_days,
|
||||
(ci.current_stock + COALESCE(ooi.on_order_qty, 0)) / NULLIF(calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int), 0) AS sells_out_in_days,
|
||||
ci.current_stock / NULLIF(vel.daily, 0) AS stock_cover_in_days,
|
||||
COALESCE(ooi.on_order_qty, 0) / NULLIF(vel.daily, 0) AS po_cover_in_days,
|
||||
(ci.current_stock + COALESCE(ooi.on_order_qty, 0)) / NULLIF(vel.daily, 0) AS sells_out_in_days,
|
||||
|
||||
-- Replenish Date: Date when stock is projected to hit safety stock, minus lead time
|
||||
CASE
|
||||
WHEN calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) > 0
|
||||
THEN _current_date + FLOOR(GREATEST(0, ci.current_stock - s.effective_safety_stock) / calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int))::int - s.effective_lead_time
|
||||
WHEN vel.daily > 0
|
||||
THEN _current_date + FLOOR(GREATEST(0, ci.current_stock - s.effective_safety_stock) / vel.daily)::int - s.effective_lead_time
|
||||
ELSE NULL
|
||||
END AS replenish_date,
|
||||
|
||||
GREATEST(0, ci.current_stock - s.effective_safety_stock - ((calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time) + (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_days_of_stock)))::int AS overstocked_units,
|
||||
(GREATEST(0, ci.current_stock - s.effective_safety_stock - ((calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time) + (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_days_of_stock)))) * ci.current_effective_cost AS overstocked_cost,
|
||||
(GREATEST(0, ci.current_stock - s.effective_safety_stock - ((calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time) + (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_days_of_stock)))) * ci.current_price AS overstocked_retail,
|
||||
GREATEST(0, ci.current_stock - s.effective_safety_stock - ((vel.daily * s.effective_lead_time) + (vel.daily * s.effective_days_of_stock)))::int AS overstocked_units,
|
||||
(GREATEST(0, ci.current_stock - s.effective_safety_stock - ((vel.daily * s.effective_lead_time) + (vel.daily * s.effective_days_of_stock)))) * ci.current_effective_cost AS overstocked_cost,
|
||||
(GREATEST(0, ci.current_stock - s.effective_safety_stock - ((vel.daily * s.effective_lead_time) + (vel.daily * s.effective_days_of_stock)))) * ci.current_price AS overstocked_retail,
|
||||
|
||||
-- Old Stock Flag
|
||||
(ci.created_at::date < _current_date - INTERVAL '60 day') AND
|
||||
@@ -542,18 +639,18 @@ BEGIN
|
||||
ELSE
|
||||
CASE
|
||||
-- Check for overstock first
|
||||
WHEN GREATEST(0, ci.current_stock - s.effective_safety_stock - ((calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_lead_time) + (calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int) * s.effective_days_of_stock))) > 0 THEN 'Overstock'
|
||||
WHEN GREATEST(0, ci.current_stock - s.effective_safety_stock - ((vel.daily * s.effective_lead_time) + (vel.daily * s.effective_days_of_stock))) > 0 THEN 'Overstock'
|
||||
|
||||
-- Check for Critical stock
|
||||
WHEN ci.current_stock <= 0 OR
|
||||
(ci.current_stock / NULLIF(calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int), 0)) <= 0 THEN 'Critical'
|
||||
(ci.current_stock / NULLIF(vel.daily, 0)) <= 0 THEN 'Critical'
|
||||
|
||||
WHEN (ci.current_stock / NULLIF(calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int), 0)) < (COALESCE(s.effective_lead_time, 30) * 0.5) THEN 'Critical'
|
||||
WHEN (ci.current_stock / NULLIF(vel.daily, 0)) < (COALESCE(s.effective_lead_time, 30) * 0.5) THEN 'Critical'
|
||||
|
||||
-- Check for reorder soon
|
||||
WHEN ((ci.current_stock + COALESCE(ooi.on_order_qty, 0)) / NULLIF(calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int), 0)) < (COALESCE(s.effective_lead_time, 30) + 7) THEN
|
||||
WHEN ((ci.current_stock + COALESCE(ooi.on_order_qty, 0)) / NULLIF(vel.daily, 0)) < (COALESCE(s.effective_lead_time, 30) + 7) THEN
|
||||
CASE
|
||||
WHEN (ci.current_stock / NULLIF(calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int), 0)) < (COALESCE(s.effective_lead_time, 30) * 0.5) THEN 'Critical'
|
||||
WHEN (ci.current_stock / NULLIF(vel.daily, 0)) < (COALESCE(s.effective_lead_time, 30) * 0.5) THEN 'Critical'
|
||||
ELSE 'Reorder Soon'
|
||||
END
|
||||
|
||||
@@ -574,7 +671,7 @@ BEGIN
|
||||
END) > 180 THEN 'At Risk'
|
||||
|
||||
-- Very high stock cover is at risk too
|
||||
WHEN (ci.current_stock / NULLIF(calculate_sales_velocity(sa.sales_30d::int, sa.stockout_days_30d::int), 0)) > 365 THEN 'At Risk'
|
||||
WHEN (ci.current_stock / NULLIF(vel.daily, 0)) > 365 THEN 'At Risk'
|
||||
|
||||
-- New products (less than 30 days old)
|
||||
WHEN (CASE
|
||||
@@ -624,7 +721,11 @@ BEGIN
|
||||
LEFT JOIN ServiceLevels sl ON ci.pid = sl.pid
|
||||
LEFT JOIN BeginningStock bs ON ci.pid = bs.pid
|
||||
LEFT JOIN SeasonalityAnalysis season ON ci.pid = season.pid
|
||||
WHERE s.exclude_forecast IS FALSE OR s.exclude_forecast IS NULL -- Exclude products explicitly marked
|
||||
LEFT JOIN StockCoverage sc ON ci.pid = sc.pid
|
||||
LEFT JOIN ProductVelocity vel ON ci.pid = vel.pid
|
||||
-- NOTE: products with exclude_from_forecast still get a metrics row (so they
|
||||
-- appear in brand/vendor/category rollups); only their forecast-derived
|
||||
-- columns are NULLed via ProductVelocity.
|
||||
|
||||
ON CONFLICT (pid) DO UPDATE SET
|
||||
last_calculated = EXCLUDED.last_calculated,
|
||||
|
||||
@@ -463,7 +463,7 @@ router.get('/efficiency', async (req, res) => {
|
||||
SUM(revenue_30d) AS revenue_30d,
|
||||
CASE
|
||||
WHEN SUM(avg_stock_cost_30d) > 0
|
||||
THEN (SUM(profit_30d) / SUM(avg_stock_cost_30d)) * 12
|
||||
THEN (SUM(profit_30d) / SUM(avg_stock_cost_30d)) * 12.17
|
||||
ELSE 0
|
||||
END AS gmroi
|
||||
FROM product_metrics
|
||||
|
||||
@@ -357,6 +357,9 @@ router.get('/forecast/metrics', async (req, res) => {
|
||||
|
||||
const active = parseInt(totals.active_products) || 1;
|
||||
const curveProducts = parseInt(totals.curve_products) || 0;
|
||||
// NOTE: despite the name, this is "share of active products forecast via
|
||||
// lifecycle curves" (curve coverage), NOT a statistical confidence. It only
|
||||
// feeds a per-day tooltip field. See FORECAST_FIX_PLAN F9 (point 4).
|
||||
const confidenceLevel = parseFloat((curveProducts / active).toFixed(2));
|
||||
|
||||
// Daily series from actual forecast
|
||||
@@ -687,14 +690,29 @@ router.get('/forecast/accuracy', async (req, res) => {
|
||||
const { rows: metrics } = await executeQuery(`
|
||||
SELECT metric_type, dimension_value, sample_size,
|
||||
total_actual_units, total_forecast_units,
|
||||
mae, wmape, bias, rmse
|
||||
mae, wmape, bias, rmse, naive_wmape, fva
|
||||
FROM forecast_accuracy
|
||||
WHERE run_id = $1
|
||||
ORDER BY metric_type, dimension_value
|
||||
`, [latestRunId]);
|
||||
|
||||
// Shared shaping for an "overall"-style aggregate row (daily or weekly grain).
|
||||
const shapeOverall = (m) => m ? {
|
||||
sampleSize: parseInt(m.sample_size),
|
||||
totalActual: parseFloat(m.total_actual_units) || 0,
|
||||
totalForecast: parseFloat(m.total_forecast_units) || 0,
|
||||
mae: m.mae != null ? parseFloat(parseFloat(m.mae).toFixed(4)) : null,
|
||||
wmape: m.wmape != null ? parseFloat((parseFloat(m.wmape) * 100).toFixed(1)) : null,
|
||||
bias: m.bias != null ? parseFloat(parseFloat(m.bias).toFixed(4)) : null,
|
||||
rmse: m.rmse != null ? parseFloat(parseFloat(m.rmse).toFixed(4)) : null,
|
||||
naiveWmape: m.naive_wmape != null ? parseFloat((parseFloat(m.naive_wmape) * 100).toFixed(1)) : null,
|
||||
fva: m.fva != null ? parseFloat(parseFloat(m.fva).toFixed(3)) : null,
|
||||
} : null;
|
||||
|
||||
// Organize into response structure
|
||||
const overall = metrics.find(m => m.metric_type === 'overall');
|
||||
const overall = metrics.find(m => m.metric_type === 'overall' && m.dimension_value === 'all')
|
||||
const overallInclDormant = metrics.find(m => m.metric_type === 'overall' && m.dimension_value === 'all_incl_dormant')
|
||||
const overallWeekly = metrics.find(m => m.metric_type === 'overall_weekly');
|
||||
const byPhase = metrics
|
||||
.filter(m => m.metric_type === 'by_phase')
|
||||
.map(m => ({
|
||||
@@ -706,6 +724,8 @@ router.get('/forecast/accuracy', async (req, res) => {
|
||||
wmape: m.wmape != null ? parseFloat((parseFloat(m.wmape) * 100).toFixed(1)) : null,
|
||||
bias: m.bias != null ? parseFloat(parseFloat(m.bias).toFixed(4)) : null,
|
||||
rmse: m.rmse != null ? parseFloat(parseFloat(m.rmse).toFixed(4)) : null,
|
||||
naiveWmape: m.naive_wmape != null ? parseFloat((parseFloat(m.naive_wmape) * 100).toFixed(1)) : null,
|
||||
fva: m.fva != null ? parseFloat(parseFloat(m.fva).toFixed(3)) : null,
|
||||
}))
|
||||
.sort((a, b) => (b.totalActual || 0) - (a.totalActual || 0));
|
||||
|
||||
@@ -763,6 +783,26 @@ router.get('/forecast/accuracy', async (req, res) => {
|
||||
sampleSize: parseInt(r.sample_size),
|
||||
}));
|
||||
|
||||
// Weekly-grain trend across runs (starts empty for old runs that predate
|
||||
// the overall_weekly metric — that's expected, no backfill). F9.
|
||||
const { rows: weeklyTrendRows } = await executeQuery(`
|
||||
SELECT fr.finished_at::date AS run_date,
|
||||
fa.wmape, fa.naive_wmape, fa.fva, fa.sample_size
|
||||
FROM forecast_accuracy fa
|
||||
JOIN forecast_runs fr ON fr.id = fa.run_id
|
||||
WHERE fa.metric_type = 'overall_weekly'
|
||||
AND fa.dimension_value = 'all'
|
||||
ORDER BY fr.finished_at
|
||||
`);
|
||||
|
||||
const accuracyTrendWeekly = weeklyTrendRows.map(r => ({
|
||||
date: r.run_date instanceof Date ? r.run_date.toISOString().split('T')[0] : r.run_date,
|
||||
wmape: r.wmape != null ? parseFloat((parseFloat(r.wmape) * 100).toFixed(1)) : null,
|
||||
naiveWmape: r.naive_wmape != null ? parseFloat((parseFloat(r.naive_wmape) * 100).toFixed(1)) : null,
|
||||
fva: r.fva != null ? parseFloat(parseFloat(r.fva).toFixed(3)) : null,
|
||||
sampleSize: parseInt(r.sample_size),
|
||||
}));
|
||||
|
||||
res.json({
|
||||
hasData: true,
|
||||
computedAt,
|
||||
@@ -775,20 +815,15 @@ router.get('/forecast/accuracy', async (req, res) => {
|
||||
? historyInfo.latest_date.toISOString().split('T')[0]
|
||||
: historyInfo.latest_date,
|
||||
},
|
||||
overall: overall ? {
|
||||
sampleSize: parseInt(overall.sample_size),
|
||||
totalActual: parseFloat(overall.total_actual_units) || 0,
|
||||
totalForecast: parseFloat(overall.total_forecast_units) || 0,
|
||||
mae: overall.mae != null ? parseFloat(parseFloat(overall.mae).toFixed(4)) : null,
|
||||
wmape: overall.wmape != null ? parseFloat((parseFloat(overall.wmape) * 100).toFixed(1)) : null,
|
||||
bias: overall.bias != null ? parseFloat(parseFloat(overall.bias).toFixed(4)) : null,
|
||||
rmse: overall.rmse != null ? parseFloat(parseFloat(overall.rmse).toFixed(4)) : null,
|
||||
} : null,
|
||||
overall: shapeOverall(overall),
|
||||
overallInclDormant: shapeOverall(overallInclDormant),
|
||||
overallWeekly: shapeOverall(overallWeekly),
|
||||
byPhase,
|
||||
byLeadTime,
|
||||
byMethod,
|
||||
dailyTrend,
|
||||
accuracyTrend,
|
||||
accuracyTrendWeekly,
|
||||
});
|
||||
} catch (err) {
|
||||
console.error('Error fetching forecast accuracy:', err);
|
||||
|
||||
@@ -2,7 +2,7 @@ import { useQuery } from "@tanstack/react-query"
|
||||
import { apiFetch } from '@/utils/api';
|
||||
import { BarChart, Bar, ResponsiveContainer, XAxis, YAxis, Tooltip as RechartsTooltip, Cell, LineChart, Line } from "recharts"
|
||||
import config from "@/config"
|
||||
import { Target, TrendingDown, ArrowUpDown } from "lucide-react"
|
||||
import { Target, TrendingDown, ArrowUpDown, Swords } from "lucide-react"
|
||||
import { Tooltip as UITooltip, TooltipContent, TooltipProvider, TooltipTrigger } from "@/components/ui/tooltip"
|
||||
import { PHASE_CONFIG } from "@/utils/lifecyclePhases"
|
||||
|
||||
@@ -14,6 +14,8 @@ interface OverallMetrics {
|
||||
wmape: number | null
|
||||
bias: number | null
|
||||
rmse: number | null
|
||||
naiveWmape?: number | null
|
||||
fva?: number | null
|
||||
}
|
||||
|
||||
interface PhaseAccuracy {
|
||||
@@ -25,6 +27,8 @@ interface PhaseAccuracy {
|
||||
wmape: number | null
|
||||
bias: number | null
|
||||
rmse: number | null
|
||||
naiveWmape?: number | null
|
||||
fva?: number | null
|
||||
}
|
||||
|
||||
interface LeadTimeAccuracy {
|
||||
@@ -51,11 +55,14 @@ interface AccuracyData {
|
||||
daysOfHistory?: number
|
||||
historyRange?: { from: string; to: string }
|
||||
overall?: OverallMetrics
|
||||
overallInclDormant?: OverallMetrics
|
||||
overallWeekly?: OverallMetrics
|
||||
byPhase?: PhaseAccuracy[]
|
||||
byLeadTime?: LeadTimeAccuracy[]
|
||||
byMethod?: { method: string; sampleSize: number; mae: number | null; wmape: number | null; bias: number | null }[]
|
||||
dailyTrend?: { date: string; mae: number | null; wmape: number | null; bias: number | null }[]
|
||||
accuracyTrend?: AccuracyTrendPoint[]
|
||||
accuracyTrendWeekly?: { date: string; wmape: number | null; naiveWmape: number | null; fva: number | null; sampleSize: number }[]
|
||||
}
|
||||
|
||||
function MetricSkeleton() {
|
||||
@@ -74,12 +81,30 @@ function formatBias(bias: number | null): string {
|
||||
}
|
||||
|
||||
function getAccuracyColor(wmape: number | null): string {
|
||||
// Daily-grain thresholds (used for the by-phase / lead-time bars).
|
||||
if (wmape === null) return "text-muted-foreground"
|
||||
if (wmape <= 30) return "text-green-600"
|
||||
if (wmape <= 50) return "text-yellow-600"
|
||||
return "text-red-600"
|
||||
}
|
||||
|
||||
function getWeeklyAccuracyColor(wmape: number | null): string {
|
||||
// Weekly per-product grain has a much lower achievable floor than daily grain
|
||||
// on this intermittent-demand catalog, so the headline uses its own thresholds.
|
||||
if (wmape === null) return "text-muted-foreground"
|
||||
if (wmape <= 60) return "text-green-600"
|
||||
if (wmape <= 90) return "text-yellow-600"
|
||||
return "text-red-600"
|
||||
}
|
||||
|
||||
function formatSignedPct(ratio: number | null, digits = 0): string {
|
||||
// ratio is a fraction (0.7 => +70%); null-safe.
|
||||
if (ratio === null || ratio === undefined) return "N/A"
|
||||
const pct = ratio * 100
|
||||
const sign = pct > 0 ? "+" : ""
|
||||
return `${sign}${pct.toFixed(digits)}%`
|
||||
}
|
||||
|
||||
export function ForecastAccuracy() {
|
||||
const { data, error, isLoading } = useQuery<AccuracyData>({
|
||||
queryKey: ["forecast-accuracy"],
|
||||
@@ -133,6 +158,24 @@ export function ForecastAccuracy() {
|
||||
sampleSize: lt.sampleSize,
|
||||
}))
|
||||
|
||||
// Headline prefers the weekly-grain WMAPE (informative); falls back to the
|
||||
// daily-grain number until enough complete weeks of history exist.
|
||||
const weeklyWmape = data?.overallWeekly?.wmape ?? null
|
||||
const usingWeekly = weeklyWmape !== null
|
||||
const headlineWmape = usingWeekly ? weeklyWmape : (data?.overall?.wmape ?? null)
|
||||
const headlineColor = usingWeekly
|
||||
? getWeeklyAccuracyColor(headlineWmape)
|
||||
: getAccuracyColor(headlineWmape)
|
||||
// Net forecast-vs-actual ratio (e.g. +70% = over-forecasting), from the
|
||||
// daily 'all' totals — far more legible than bias in raw units.
|
||||
const totalFc = data?.overall?.totalForecast ?? 0
|
||||
const totalAct = data?.overall?.totalActual ?? 0
|
||||
const fcVsAct = totalAct > 0 ? (totalFc / totalAct - 1) : null
|
||||
// Value over the naive baseline; prefer weekly grain to match the headline.
|
||||
const naiveSource = data?.overallWeekly ?? data?.overall
|
||||
const naiveWmape = naiveSource?.naiveWmape ?? null
|
||||
const fva = naiveSource?.fva ?? null
|
||||
|
||||
return (
|
||||
<div>
|
||||
<h3 className="text-lg font-medium mb-3">Forecast Accuracy</h3>
|
||||
@@ -148,10 +191,24 @@ export function ForecastAccuracy() {
|
||||
<div className="flex items-baseline justify-between">
|
||||
<div className="flex items-center gap-2">
|
||||
<Target className="h-4 w-4 text-muted-foreground" />
|
||||
<p className="text-sm font-medium text-muted-foreground">WMAPE</p>
|
||||
<p className="text-sm font-medium text-muted-foreground">
|
||||
WMAPE <span className="text-[10px] opacity-70">({usingWeekly ? "weekly" : "daily"})</span>
|
||||
</p>
|
||||
</div>
|
||||
<p className={`text-lg font-bold ${getAccuracyColor(data?.overall?.wmape ?? null)}`}>
|
||||
{formatWmape(data?.overall?.wmape ?? null)}
|
||||
<p className={`text-lg font-bold ${headlineColor}`}>
|
||||
{formatWmape(headlineWmape)}
|
||||
</p>
|
||||
</div>
|
||||
<div className="flex items-baseline justify-between">
|
||||
<div className="flex items-center gap-2">
|
||||
<ArrowUpDown className="h-4 w-4 text-muted-foreground" />
|
||||
<p className="text-sm font-medium text-muted-foreground">Forecast vs actual</p>
|
||||
</div>
|
||||
<p className="text-lg font-bold">
|
||||
{formatSignedPct(fcVsAct)}
|
||||
<span className="text-xs font-normal text-muted-foreground ml-1">
|
||||
{(fcVsAct ?? 0) > 0 ? "over" : (fcVsAct ?? 0) < 0 ? "under" : ""}
|
||||
</span>
|
||||
</p>
|
||||
</div>
|
||||
<div className="flex items-baseline justify-between">
|
||||
@@ -160,20 +217,24 @@ export function ForecastAccuracy() {
|
||||
<p className="text-sm font-medium text-muted-foreground">MAE</p>
|
||||
</div>
|
||||
<p className="text-lg font-bold">
|
||||
{data?.overall?.mae !== null ? data?.overall?.mae?.toFixed(2) : "N/A"}
|
||||
{data?.overall?.mae != null ? data?.overall?.mae?.toFixed(2) : "N/A"}
|
||||
<span className="text-xs font-normal text-muted-foreground ml-1">units</span>
|
||||
</p>
|
||||
</div>
|
||||
<div className="flex items-baseline justify-between">
|
||||
<div className="flex items-center gap-2">
|
||||
<ArrowUpDown className="h-4 w-4 text-muted-foreground" />
|
||||
<p className="text-sm font-medium text-muted-foreground">Bias</p>
|
||||
<Swords className="h-4 w-4 text-muted-foreground" />
|
||||
<p className="text-sm font-medium text-muted-foreground">vs naive</p>
|
||||
</div>
|
||||
<p className="text-lg font-bold">
|
||||
{formatBias(data?.overall?.bias ?? null)}
|
||||
<span className="text-xs font-normal text-muted-foreground ml-1">
|
||||
{(data?.overall?.bias ?? 0) > 0 ? "over" : (data?.overall?.bias ?? 0) < 0 ? "under" : ""}
|
||||
<span className={fva != null ? (fva > 0 ? "text-green-600" : "text-red-600") : "text-muted-foreground"}>
|
||||
{fva != null ? `${formatSignedPct(fva)} FVA` : "N/A"}
|
||||
</span>
|
||||
{naiveWmape != null && (
|
||||
<span className="text-xs font-normal text-muted-foreground ml-1">
|
||||
naive {formatWmape(naiveWmape)}
|
||||
</span>
|
||||
)}
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
Reference in New Issue
Block a user