26 KiB
Forecast Accuracy Fix Plan
Written: 2026-06-10, from a code + live-data review of the forecasting pipeline. Goal: eliminate the systematic ~1.7–2x over-forecast bias, recover demand the model currently ignores, and fix the accuracy measurement so improvements are visible and long-lead forecasts are validated.
Read this whole document before starting. Fixes are grouped into phases; each phase is independently deployable and has its own validation step. Line numbers are as of 2026-06-10 — re-locate by function name if the file has drifted.
1. Diagnosis summary (measured 2026-06-10)
The dashboard headline is 202% WMAPE. Decomposition of that number, all measured against forecast_accuracy run 129 and ad-hoc queries:
| Finding | Evidence |
|---|---|
| Daily-grain WMAPE has a ~190% floor for this catalog | Avg demand ≈ 0.11 units/product/day. A perfect rate forecast of intermittent demand scores ≈ 2e^−λ ≈ 190%. A trivial trailing-30d-average naive forecast scores 204% on the same products/days; the engine scores 221% (slightly worse than naive). |
| Same forecasts at 21-day-per-product grain: 109%; bias-corrected: 75% | Half the headline is metric grain, most of the rest is bias. |
| Aggregate over-forecast +70% (227,690 forecast vs 133,861 actual units) | Portfolio daily ratio is 1.5–2.5x on most days. |
| Decay phase 2.47x over (fc 51,675 / act 20,915) | Root cause F1: velocity inflated 4.07x (measured: 1.353 vs true 0.332 units/day) by averaging over sparse snapshot rows. |
| Preorder phase 2.15x over (fc 67,212 / act 31,189) | Root cause F4: launch curve applied at age=0 starting today, ignoring that the product hasn't arrived. |
| Mature phase 1.69x over (fc 57,857 / act 34,313) | Root causes F2 (history edge truncation) + F3 (seasonal double-count). |
| Dormant products sold 16,180 units (~11% of demand) against zero forecasts | Root cause F5; also excluded from the headline metric, so invisible. |
| All 879,800 accuracy samples are in the 1–7d lead bucket | Root cause F7: archiving design only ever saves yesterday's slice. 30–90d forecasts (what purchasing uses) are never validated. |
| Launch phase is healthy: WMAPE 100%, bias −6%, beats naive | The lifecycle-curve concept works; its calibration inputs are broken. Don't redesign it. |
Key data fact underlying several fixes: daily_product_snapshots is activity-based and sparse — only ~500–1,800 of ~38K products have a row on a given day. Verified: every pid-day with an order DOES have a snapshot row and units match (5,234/5,234 pid-days, 8,980 vs 8,984 units over 7 days). So missing row = zero sales, and any query that aggregates over only the rows that exist is averaging over sold-days.
2. Environment & operational notes
- Files: engine is
inventory-server/scripts/forecast/forecast_engine.py; orchestratorrun_forecast.jsin the same dir; consumer endpoints ininventory-server/src/routes/dashboard.js(/forecast/metrics~line 308,/forecast/accuracy~line 647); overview UI ininventory/src/components/overview/ForecastMetrics.tsxandForecastAccuracy.tsx. - Local
inventory-server/is NFS-mounted to/var/www/inventory/on the netcup server. Edits made locally appear on the server immediately — no copy step. Do NOT run bulkgrep/find/node --checkoverinventory-server/locally (the mount hangs);ssh netcupand run them there. - Avoid the glob tool for search in this repo; use bash (
grep/rgvia ssh for server-side trees). - Scheduling: the engine runs daily at 09:30:01 server time (runs table is conclusive), but the cron entry is NOT in matt's crontab,
/etc/cron.d, or pm2. Likely root's crontab (sudo crontab -lto confirm). You do not need to touch the schedule for these fixes; just know a run fires at 09:30 daily and occasionally skips days (e.g. 2026-06-07/08). - Manual test runs:
ssh netcup, thencd /var/www/inventory/scripts/forecast && node run_forecast.js. Takes ~3.5–4 min. Safe to run any time: the engine TRUNCATEs and rebuildsproduct_forecasts, archives prior past-dated rows, and records a newforecast_runsrow. Python deps live in the server venv (venv/);run_forecast.jshandles env + venv automatically. - DB access for validation:
ssh netcup, thenPGPASSWORD=6D3GUkxuFgi2UghwgnUd psql -h localhost -U inventory_readonly -d inventory_db. The engine itself connects with the write user via env vars loaded from/var/www/inventory/.env— schema changes should be made idempotently inside the engine code (the file already usesCREATE TABLE IF NOT EXISTS/CREATE INDEX IF NOT EXISTS; useALTER TABLE ... ADD COLUMN IF NOT EXISTSthe same way) so no manual migration is needed. - Python gotchas already handled in this file (don't regress): numpy types must go through the registered psycopg2 adapters;
pd.Series.combine_first()keeps zeros over real data — usereindex(..., fill_value=0.0). - Engine runtime budget: currently ~212–227s. Phases 1–2 shouldn't move it meaningfully; Phase 3's extra archiving adds one INSERT…SELECT. If runtime balloons past ~6 min, investigate before shipping.
--backfillmode (backfill_accuracy_data) is an in-sample backtest using the old formulas. Do not run it anymore; there is enough real out-of-sample history. Updating it to match the new logic is optional/low priority (F11).
Phase 1 — Bias bugs in the engine (no schema changes)
F1. Decay velocity: stop averaging over sparse snapshot rows
Where: forecast_engine.py, batch_load_product_data(), the decay query (~lines 697–710).
Problem: AVG(COALESCE(dps.units_sold, 0)) runs over only the snapshot rows that exist — mostly sold-days. Measured inflation on the current 975 decay products: 4.07x (1.353 vs 0.332 true units/day). This feeds compute_scale_factor() for the decay phase and is the single largest bias source.
Fix: divide the sum by calendar days in the window, clipped to the product's age (decay products are 14–60 days old, so a 20-day-old product's window is 20 days, not 30):
SELECT dps.pid,
SUM(COALESCE(dps.units_sold, 0))::float
/ GREATEST(LEAST(30, (CURRENT_DATE - pm.date_first_received::date)), 1) AS avg_daily
FROM daily_product_snapshots dps
JOIN product_metrics pm ON pm.pid = dps.pid
WHERE dps.pid = ANY(%s)
AND dps.snapshot_date >= CURRENT_DATE - INTERVAL '30 days'
AND dps.snapshot_date >= pm.date_first_received::date
GROUP BY dps.pid, pm.date_first_received
No Python-side changes needed; data['decay_velocity'] keeps the same shape. Products with zero snapshot rows in the window still get no entry → existing scale = 1.0 fallback applies (acceptable: decay classification requires sales_velocity_daily > 0, so truly dead products don't reach this path).
F2. Mature history: reindex over the full calendar window
Where: forecast_engine.py, forecast_mature() (~lines 833–836).
Problem: hist.set_index('snapshot_date').resample('D').sum() only spans first-snapshot → last-snapshot. Interior gaps correctly become zeros, but leading and trailing quiet periods are absent, so the Holt level is fitted on the product's busy span. A marginal mature product whose activity clusters in 2 of the last 8 weeks gets a level ~4x too high.
Fix: replace the resample with an explicit reindex over the full EXP_SMOOTHING_WINDOW ending yesterday:
hist = history_df.copy()
hist['snapshot_date'] = pd.to_datetime(hist['snapshot_date'])
hist = hist.set_index('snapshot_date')['units_sold']
full_index = pd.date_range(
end=pd.Timestamp(date.today() - timedelta(days=1)),
periods=EXP_SMOOTHING_WINDOW, freq='D')
series = hist.reindex(full_index, fill_value=0.0).values.astype(float)
Notes: (pid, snapshot_date) is unique in daily_product_snapshots, so no duplicate-index risk. observed_mean and the cap recompute over the full window automatically (intended — the cap gets correspondingly tighter). Mature products are by definition >60 days old, so the 60-day window never predates first receipt. Do NOT use combine_first (see gotchas above).
F3. Stop double-applying the monthly seasonal index
Where: forecast_engine.py, generate_all_forecasts() — the seasonal_multipliers pre-compute (~lines 959–961) and application (~line 1050).
Problem: every per-product calibration (decay velocity, mature Holt level, launch first-week scale, preorder rate, slow-mover velocity) is fitted on raw recent actuals, which already embed the current month's seasonal level. The forecast then multiplies by the absolute monthly index of the target date. Example from the live indices (forecast_runs.phase_counts for run 129): May = 1.224 (sale month), June = 0.982. Early-June forecasts were calibrated on May-sale-inflated velocities and barely discounted — a structural ~25% over-forecast at that transition, and it'll be worse around November (1.316).
Fix: apply the seasonal index relative to the calibration period. Compute a calibration index as the average monthly index over the trailing 30 calendar days (robust at month boundaries), then divide:
today = date.today()
trailing = [today - timedelta(days=i) for i in range(1, 31)]
calibration_index = float(np.mean([monthly_indices.get(d.month, 1.0) for d in trailing]))
seasonal_multipliers = [
monthly_indices.get(d.month, 1.0) / max(calibration_index, 0.1)
for d in forecast_dates
]
Leave the DOW multipliers absolute — every calibration is a multi-week average and therefore DOW-neutral, so reshaping by absolute DOW indices is correct.
Optional sub-fix (same area, low priority): the monthly indices are computed from a single trailing 365-day window, so each month appears once and YoY growth contaminates "seasonality". A cheap improvement is widening SEASONAL_LOOKBACK_DAYS to 730 and averaging the two observations of each month. Do this only after the main fixes are validated.
Phase 1 validation
Deploy (edit locally; NFS propagates), run the engine manually once, wait for 3–5 daily cycles, then:
-- Portfolio ratio per day (target: drifts from ~2.0 toward 0.8–1.3)
WITH ranked AS (
SELECT pfh.pid, pfh.forecast_date, pfh.forecast_units, pfh.lifecycle_phase,
ROW_NUMBER() OVER (PARTITION BY pfh.pid, pfh.forecast_date ORDER BY fr.started_at DESC) rn
FROM product_forecasts_history pfh
JOIN forecast_runs fr ON fr.id = pfh.run_id
WHERE pfh.forecast_date >= CURRENT_DATE - 7)
SELECT r.forecast_date, round(SUM(r.forecast_units),0) AS fc,
SUM(COALESCE(dps.units_sold,0)) AS act,
round(SUM(r.forecast_units)/NULLIF(SUM(COALESCE(dps.units_sold,0)),0),2) AS ratio
FROM ranked r
LEFT JOIN daily_product_snapshots dps ON dps.pid = r.pid AND dps.snapshot_date = r.forecast_date
WHERE r.rn = 1 AND r.lifecycle_phase != 'dormant'
GROUP BY 1 ORDER BY 1;
Also check forecast_accuracy by_phase rows for the newest run: decay bias should fall from +0.35 toward ~0, mature from +0.17 toward ~0. (Accuracy lags ~1 day behind each fix since it evaluates yesterday's forecasts.)
Phase 2 — Demand the model currently ignores or mistimes
F4. Preorder: forecast the preorder rate until arrival, launch curve after
Where: forecast_engine.py — batch_load_product_data() (add arrival dates), generate_all_forecasts() preorder branch (~lines 1005–1009), and forecast_from_curve() (or a small wrapper).
Problem: preorder products run the launch curve from age=0 starting today, i.e. full first-week launch sales while the product is still weeks from arriving. Actual preorder-period sales are a much slower trickle.
Fix:
- Batch-load each preorder product's expected arrival from
purchase_orders(line-item grain: it haspidandexpected_datedirectly). Open statuses verified against live data:created,ordered,electronically_sent,receiving_started(~705 open line items currently have a futureexpected_date):
SELECT pid, MIN(expected_date) AS expected_arrival
FROM purchase_orders
WHERE pid = ANY(%s)
AND status IN ('created', 'ordered', 'electronically_sent', 'receiving_started')
AND expected_date IS NOT NULL
AND expected_date >= CURRENT_DATE
GROUP BY pid
Fallbacks, in order: (a) an open PO with a past expected_date → assume arrival in 7 days; (b) no PO at all → arrival in 14 days (and log a counter of how many hit this default).
- In the preorder branch, build the daily array piecewise. Let
days_until_arrival = (expected_arrival - today).days:- Days
0 .. days_until_arrival-1: flat observed preorder daily rate =preorder_sales[pid] / max(preorder_days[pid], 1)(both already batch-loaded), clamped to ≤ the curve's scaled week-0 daily value. - Days
days_until_arrival .. horizon:forecast_from_curve(curve_info, scale, age_days=0, ...)shifted so the curve's day 0 lands on the arrival date (i.e. passhorizon_days - days_until_arrivaland offset into the output array). - Keep the existing
compute_scale_factor('preorder', ...)for the post-arrival curve; the pre-arrival segment doesn't use it.
- Days
This is consistent with how the reference curves were built: historical preorder units were recorded on their order dates (pre-arrival), so week-0 of the fitted curves reflects post-receipt orders, not the backlog.
F5. Dormant products: small positive rate instead of hard zero, and count them
Where: forecast_engine.py — generate_all_forecasts() dormant branch (~lines 1040–1042), batch_load_product_data(), and compute_accuracy().
Problem: all ~28K dormant products are forecast at exactly 0, yet they sold 16,180 units in the eval window (~11% of all demand) — restocks, promos, long-tail. Worse, dormant is excluded from the headline accuracy filter, so this miss is invisible.
Fix (cheap version, do this now):
- Batch-load a trailing-180-day order rate for dormant products (11,362 of them have ≥1 sale in 180d — verified):
SELECT o.pid, SUM(o.quantity) / 180.0 AS rate
FROM orders o
WHERE o.pid = ANY(%s)
AND o.canceled IS DISTINCT FROM TRUE
AND o.date >= CURRENT_DATE - INTERVAL '180 days'
GROUP BY o.pid
- Dormant branch: if the product has a rate > 0, forecast it flat with
method = 'velocity'; else keep zeros withmethod = 'zero'. Apply the same DOW/seasonal multipliers as everything else (automatic — they're applied after the branch). - In
compute_accuracy(), add a second overall row:metric_type='overall', dimension_value='all_incl_dormant'with no dormant filter (keep the existing'all'row unchanged for trend continuity). One extra entry in thedimensions/filter_clausesdicts.
Upgrade path (optional, Phase 4): replace flat rates for slow_mover + dormant-with-sales with TSB (Teunter–Syntetos–Babai), the standard intermittent-demand method with obsolescence handling. Per product over a daily series d_t (build it from snapshots the F2 way — full calendar reindex):
if d_t > 0: p_t = p_{t-1} + β·(1 − p_{t-1}); z_t = z_{t-1} + α·(d_t − z_{t-1})
else: p_t = p_{t-1}·(1 − β); z_t = z_{t-1}
forecast = p_T · z_T (flat across horizon)
Start with α=0.1, β=0.05, initialize p = (nonzero days / total days), z = mean of nonzero demands. Scope: slow_mover (~6K) + dormant with 180d sales (~11K); series from up to 180 days of snapshots (sparse rows → ~manageable volume). Only do this after Phase 3 measurement exists to prove it beats the flat rates.
Phase 2 validation
After 3–5 cycles: preorder by_phase bias should drop from +0.85 toward < +0.3; the new all_incl_dormant row should appear and its total_actual_units minus 'all''s should be largely covered rather than all-miss (dormant bias rising from −1.36 toward ~−0.3 or better).
Phase 3 — Fix the measurement (schema + engine + API + UI)
Without this phase you cannot see whether Phases 1–2 worked except by ad-hoc SQL, the lead-time chart stays a single bucket forever, and the dashboard keeps displaying a number with a 190% floor in red.
F7. Archive long-lead forecasts so 15/30/60/90d accuracy exists
Where: forecast_engine.py — archive_forecasts() (~lines 1086–1154), compute_accuracy() CTE (~lines 1201–1228).
Problem: the current design archives only past-dated rows of the previous run before truncation. With daily runs, that's only ever the 1-day-ahead slice — all 879,800 accuracy samples sit in the '1-7d' bucket and the longer buckets in the UI chart can never populate. Purchasing decisions ride on 30–60d forecasts that are never validated.
Fix:
- Keep the existing past-date archiving exactly as is (it provides dense short-lead coverage).
- After
generate_all_forecasts()completes, additionally archive a sampled set of future leads from the new run, non-dormant only, attributed to the current run id (correct attribution, unlike the past-date path which attributes to the previous run):
INSERT INTO product_forecasts_history
(run_id, pid, forecast_date, forecast_units, forecast_revenue,
lifecycle_phase, forecast_method, confidence_lower, confidence_upper, generated_at)
SELECT %(run_id)s, pid, forecast_date, forecast_units, forecast_revenue,
lifecycle_phase, forecast_method, confidence_lower, confidence_upper, generated_at
FROM product_forecasts
WHERE lifecycle_phase != 'dormant'
AND forecast_date - CURRENT_DATE IN (7, 14, 30, 60, 89)
ON CONFLICT (run_id, pid, forecast_date) DO NOTHING
Volume: ~10K non-dormant products × 5 leads ≈ 50K rows/day; the existing 90-day prune (forecast_date < CURRENT_DATE - 90) bounds steady state at a few million rows. Note future-dated rows survive until their date passes + 90 days — that's intended.
- CRITICAL companion change in
compute_accuracy(): the accuracy CTE must now exclude not-yet-realized rows, or future-dated archives get scored against actual=0:
FROM product_forecasts_history pfh
JOIN forecast_runs fr ON fr.id = pfh.run_id
WHERE pfh.forecast_date < CURRENT_DATE -- ADD THIS
- Dedup semantics change. Today's
ROW_NUMBER() OVER (PARTITION BY pid, forecast_date ORDER BY started_at DESC)keeps only the latest (= shortest-lead) row per pid/date, which would silently discard all the new long-lead rows. Restructure:- Compute
lead_days = forecast_date - started_at::dateand the lead bucket insideranked_history. - For
by_lead_time: dedupPARTITION BY pid, forecast_date, lead_bucket(one sample per pid/date/bucket, latest run wins within a bucket). - For everything else (
overall,by_phase,by_method,daily, and the new weekly metric below): restrict tolead_days BETWEEN 0 AND 6and keep the existing per-(pid, date) dedup. This preserves the current meaning of the headline metrics (short-lead) while the lead-time table becomes real.
- Compute
F8. Track a naive baseline (forecast value-added)
Where: archive_forecasts() (both INSERT paths), compute_accuracy(), forecast_accuracy schema, /forecast/accuracy endpoint.
Problem: the engine currently loses to a trailing-average naive forecast (221% vs 204% daily WMAPE) and nothing on the dashboard would ever reveal that. Every accuracy improvement should be judged as value-over-naive.
Fix:
- Schema (idempotent, in the ensure blocks):
ALTER TABLE product_forecasts_history ADD COLUMN IF NOT EXISTS naive_units NUMERIC(10,2);andALTER TABLE forecast_accuracy ADD COLUMN IF NOT EXISTS naive_wmape NUMERIC(10,4), ADD COLUMN IF NOT EXISTS fva NUMERIC(10,4); - Populate
naive_unitsduring both archive INSERTs via a join — naive = flat trailing-28-day average daily units as of archive time (28 days = DOW-balanced; information available at generation; same value at every lead, which is exactly what a naive baseline means):
LEFT JOIN (
SELECT o.pid, SUM(o.quantity) / 28.0 AS naive_daily
FROM orders o
WHERE o.canceled IS DISTINCT FROM TRUE
AND o.date >= CURRENT_DATE - INTERVAL '28 days' AND o.date < CURRENT_DATE
GROUP BY o.pid
) nv ON nv.pid = pf.pid
-- select COALESCE(nv.naive_daily, 0) AS naive_units
- In
compute_accuracy(), add to each dimension's aggregate:SUM(ABS(naive_units - actual_units)) / NULLIF(SUM(actual_units),0) AS naive_wmapeand storefva = 1 - wmape / naive_wmape(NULL-safe). Rows archived before this change havenaive_unitsNULL — treat NULL as excluded (FILTER (WHERE naive_units IS NOT NULL)on the naive sums) rather than as zero. - Endpoint: include
naiveWmapeandfvain theoverall(and per-phase) payload of/dashboard/forecast/accuracyindashboard.js.
F9. Weekly-grain headline metric + bias as a percentage
Where: compute_accuracy(), /forecast/accuracy endpoint, ForecastAccuracy.tsx.
Problem: daily-grain WMAPE on this catalog has a ~190% floor — as a headline it's noise. The informative numbers are (a) weekly-per-product WMAPE (currently ~109%, target ~70–85% post-fix) and (b) aggregate bias, which the UI currently renders as +0.108 units — indistinguishable from zero while the reality is +70%.
Fix:
- New metric in
compute_accuracy():metric_type='overall_weekly', dimension_value='all'. Definition: using the short-lead deduped rows (lead ≤ 6, non-dormant), aggregate per(pid, date_trunc('week', forecast_date))keeping only complete weeks (COUNT(*) = 7), thenWMAPE = SUM(ABS(fc_week − act_week)) / SUM(act_week), excluding pid-weeks where both are 0. Store sample_size = number of pid-weeks. Computenaive_wmape/fvathe same way fromnaive_units. - Endpoint: expose as
overallWeekly; also add a weekly variant to theaccuracyTrendquery (metric_type='overall_weekly'). The trend will start empty (old runs lack the row) — that's fine; don't backfill. ForecastAccuracy.tsx:- Headline WMAPE →
overallWeekly.wmape, labeled "WMAPE (weekly)". Keep daily WMAPE available in a tooltip if desired. - Color thresholds for weekly grain: green ≤ 60, yellow ≤ 90, red above (tunable; document that they're calibrated for intermittent retail demand).
- Replace the bias row: show
(totalForecast / totalActual − 1)as a signed percentage labeled "Forecast vs actual" (both totals already arrive inoverall). Keep MAE. - Add a "vs naive" line: naive weekly WMAPE and FVA. FVA > 0 = engine adds value.
- The lead-time chart needs no code change — buckets will populate as F7 rows mature (7d lead evaluable after 7 days, 30d after 30, etc.).
- Headline WMAPE →
confidenceLevelin/forecast/metrics([dashboard.js ~line 360]) is "share of products forecast via lifecycle curves", not confidence. It only feeds a per-day tooltip field — rename the JSON field tocurveCoverageand update the one consumer inForecastMetrics.tsx, or leave it and add a comment; low priority.
Phase 3 validation
- Next run after deploy:
forecast_accuracycontainsoverall_weeklyandfvavalues;/dashboard/forecast/accuracyreturns them; the overview popover renders weekly WMAPE, bias %, and the naive comparison. - After 7/14/30 days:
by_lead_timerows appear for '8-14d', '15-30d', '31-60d' buckets respectively (61-90d after ~60 days). - Confirm engine runtime still < ~5 min and
product_forecasts_historygrowth ≈ 50–70K rows/day.
Phase 4 — Optional / after the above is proven
- F6. TSB for slow movers + dormant (spec in F5). Gate on Phase 3 measurement: ship only if weekly FVA improves on those phases.
- F10. Confidence-margin source:
load_accuracy_margins()feeds daily-grain per-phase WMAPE (clamped to 1.0) into the intervals, so every interval is ±100% — uninformative. Onceoverall_weeklyexists, add per-phase weekly rows (by_phase_weekly) and source margins from those instead. - F11. Update or delete
backfill_accuracy_data()(it encodes the old formulas). Until then, just don't run--backfill. - F12.
compute_dow_indices()weights by revenue but the multipliers are applied to units — switchSUM(o.price * o.quantity)toSUM(o.quantity). Tiny effect. - F13. Longer term: for reorder decisions the right target is P(lead-time demand > stock), not a point forecast. Evaluate quantile (pinball) loss at lead-time horizons using the existing confidence-interval columns. Design separately.
4. Success criteria
- Rolling-14-day portfolio forecast/actual ratio within 0.8–1.25 (currently 1.5–2.5).
- Weekly-grain WMAPE ≤ 90% and FVA > 0 (engine beats naive) sustained for 2+ weeks.
- Decay/preorder/mature per-phase bias within ±0.1 units/day (currently +0.35 / +0.85 / +0.17).
all_incl_dormantactuals covered: dormant bias better than −0.4 (currently −1.36, i.e. 100% miss).- Lead-time buckets through 31–60d populated with ≥10K samples each within ~6 weeks.
- Launch phase stays healthy (bias within ±0.15, WMAPE not degraded) — regression guard for F3/F4 changes.
5. Re-measurement appendix
The naive-vs-engine comparison used in the diagnosis (rerun any time; adjust dates):
WITH ranked AS (
SELECT pfh.pid, pfh.forecast_date, pfh.forecast_units, pfh.lifecycle_phase,
ROW_NUMBER() OVER (PARTITION BY pfh.pid, pfh.forecast_date ORDER BY fr.started_at DESC) rn
FROM product_forecasts_history pfh
JOIN forecast_runs fr ON fr.id = pfh.run_id
WHERE pfh.forecast_date BETWEEN CURRENT_DATE - 9 AND CURRENT_DATE - 1),
eng AS (SELECT * FROM ranked WHERE rn = 1 AND lifecycle_phase != 'dormant'),
naive AS (
SELECT o.pid, SUM(o.quantity)/30.0 AS naive_daily FROM orders o
WHERE o.canceled IS DISTINCT FROM TRUE
AND o.date >= CURRENT_DATE - 39 AND o.date < CURRENT_DATE - 9
GROUP BY o.pid)
SELECT e.lifecycle_phase, COUNT(*) AS n, SUM(COALESCE(dps.units_sold,0)) AS actual,
round(SUM(e.forecast_units),0) AS engine_fc, round(SUM(COALESCE(nv.naive_daily,0)),0) AS naive_fc,
round(SUM(ABS(e.forecast_units - COALESCE(dps.units_sold,0)))/NULLIF(SUM(COALESCE(dps.units_sold,0)),0),2) AS engine_wmape,
round(SUM(ABS(COALESCE(nv.naive_daily,0) - COALESCE(dps.units_sold,0)))/NULLIF(SUM(COALESCE(dps.units_sold,0)),0),2) AS naive_wmape
FROM eng e
LEFT JOIN naive nv ON nv.pid = e.pid
LEFT JOIN daily_product_snapshots dps ON dps.pid = e.pid AND dps.snapshot_date = e.forecast_date
GROUP BY ROLLUP(e.lifecycle_phase) ORDER BY 1;
Baseline numbers to beat (June 1–9, 2026): engine 221% / naive 204% daily WMAPE; engine_fc/actual = 1.82; per-phase table in §1.