# Metrics Calculation Pipeline Audit **Date:** 2026-02-07 **Scope:** All 6 SQL calculation scripts, custom DB functions, import pipeline, and live data verification ## Overview The metrics pipeline in `inventory-server/scripts/calculate-metrics-new.js` runs 6 SQL scripts sequentially: 1. `update_daily_snapshots.sql` — Aggregates daily per-product sales/receiving data 2. `update_product_metrics.sql` — Calculates the main product_metrics table (KPIs, forecasting, status) 3. `update_periodic_metrics.sql` — ABC classification, average lead time 4. `calculate_brand_metrics.sql` — Brand-level aggregated metrics 5. `calculate_vendor_metrics.sql` — Vendor-level aggregated metrics 6. `calculate_category_metrics.sql` — Category-level metrics with hierarchy rollups ### Database Scale | Table | Row Count | |---|---| | products | 681,912 | | orders | 2,883,982 | | purchase_orders | 256,809 | | receivings | 313,036 | | daily_product_snapshots | 678,312 (601 distinct dates, since 2024-06-01) | | product_metrics | 681,912 | | brand_metrics | 1,789 | | vendor_metrics | 281 | | category_metrics | 610 | --- ## Issues Found ### ISSUE 1: [HIGH] Order status filter is non-functional — numeric codes vs text comparison **Files:** `update_daily_snapshots.sql` lines 86-101, `update_product_metrics.sql` lines 89, 178-183 **Confirmed by data:** All order statuses are numeric strings ('100', '50', '55', etc.) **Status mappings from:** `docs/prod_registry.class.php` **Description:** The SQL filters `COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned')` and `o.status NOT IN ('canceled', 'returned')` are used throughout the pipeline to exclude canceled/returned orders. However, the import pipeline stores order statuses as their **raw numeric codes** from the production MySQL database (e.g., '100', '50', '55', '90', '92'). There are **zero text status values** in the orders table. This means these filters **never exclude any rows** — every comparison is `'100' NOT IN ('canceled', 'returned')` which is always true. **Actual status distribution (with confirmed meanings):** | Status | Meaning | Count | Negative Qty | Assessment | |---|---|---|---|---| | 100 | shipped | 2,862,792 | 3,352 | Completed — correct to include | | 50 | awaiting_products | 11,109 | 0 | In-progress — not yet shipped | | 55 | shipping_later | 5,689 | 0 | In-progress — not yet shipped | | 56 | shipping_together | 2,863 | 0 | In-progress — not yet shipped | | 90 | awaiting_shipment | 38 | 0 | Near-complete — not yet shipped | | 92 | awaiting_pickup | 71 | 0 | Near-complete — awaiting customer | | 95 | shipped_confirmed | 5 | 0 | Completed — correct to include | | 15 | cancelled | 1 | 0 | Should be excluded | **Full status reference (from prod_registry.class.php):** - 0=created, 10=unfinished, **15=cancelled**, 16=combined, 20=placed, 22=placed_incomplete - 30=cancelled_old (historical), 40=awaiting_payment, 50=awaiting_products - 55=shipping_later, 56=shipping_together, 60=ready, 61=flagged - 62=fix_before_pick, 65=manual_picking, 70=in_pt, 80=picked - 90=awaiting_shipment, 91=remote_wait, **92=awaiting_pickup**, 93=fix_before_ship - **95=shipped_confirmed**, **100=shipped** **Severity revised to HIGH (from CRITICAL):** Now that we know the actual meanings, no cancelled/refunded orders are being miscounted (only 1 cancelled order exists, status=15). The real concern is twofold: 1. **The text-based filter is dead code** — it can never match any row. Either map statuses to text during import (like POs do) or change SQL to use numeric comparisons. 2. **~19,775 unfulfilled orders** (statuses 50/55/56/90/92) are counted as completed sales. These are orders in various stages of fulfillment that haven't shipped yet. While most will eventually ship, counting them now inflates current-period metrics. At 0.69% of total orders, the financial impact is modest but the filter should work correctly on principle. **Note:** PO statuses ARE properly mapped to text ('canceled', 'done', etc.) in the import pipeline. Only order statuses are numeric. --- ### ISSUE 2: [CRITICAL] Daily Snapshots use current stock instead of historical EOD stock **File:** `update_daily_snapshots.sql`, lines 126-135, 173 **Confirmed by data:** Top product (pid 666925) shows `eod_stock_quantity = 0` for ALL dates even though it sold 28 units on Jan 28 (clearly had stock then) **Description:** The `CurrentStock` CTE reads `stock_quantity` directly from the `products` table at query execution time. When the script processes historical dates (today minus 1-4 days), it writes **today's stock** as if it were the end-of-day stock for those past dates. **Cascading impact on product_metrics:** - `avg_stock_units_30d` / `avg_stock_cost_30d` — Wrong averages - `stockout_days_30d` — Undercounts (only based on current stock state, not historical) - `stockout_rate_30d`, `service_level_30d`, `fill_rate_30d` — All derived from wrong stockout data - `gmroi_30d` — Wrong denominator (avg stock cost) - `stockturn_30d` — Wrong denominator (avg stock units) - `sell_through_30d` — Affected by stock level inaccuracy --- ### ISSUE 3: [CRITICAL] Snapshot coverage is 0.17% — most products have no snapshot data **Confirmed by data:** 678,312 snapshot rows across 601 dates = ~1,128 products/day out of 681,912 total **Description:** The daily snapshots script only creates rows for products with sales or receiving activity on that date (`ProductsWithActivity` CTE, line 136). This means: - 91.1% of products (621,221) have NULL `sales_30d` — they had no orders in the last 30 days so no snapshot rows exist - `AVG(eod_stock_quantity)` averages only across days with activity, not 30 days - `stockout_days_30d` only counts stockout days where there was ALSO some activity - A product out of stock with zero sales gets zero stockout_days even though it was stocked out This is by design (to avoid creating 681K rows/day) but means stock-related metrics are systematically biased. --- ### ISSUE 4: [HIGH] `costeach` fallback to 50% of price in import pipeline **File:** `inventory-server/scripts/import/orders.js` (line ~573) **Description:** When the MySQL `order_costs` table has no record for an order item, `costeach` defaults to `price * 0.5`. There is **no flag** in the PostgreSQL data to distinguish actual costs from estimated ones. **Data impact:** 385,545 products (56.5%) have `current_cost_price = 0` AND `current_landing_cost_price = 0`. For these products, the COGS calculation in daily_snapshots falls through the chain: 1. `o.costeach` — May be the 50% estimate from import 2. `get_weighted_avg_cost()` — Returns NULL if no receivings exist 3. `p.landing_cost_price` — Always NULL (hardcoded in import) 4. `p.cost_price` — 0 for 56.5% of products Only 27 products have zero COGS with positive sales, meaning the `costeach` field is doing its job for products that sell, but the 50% fallback means margins for those products are estimates, not actuals. --- ### ISSUE 5: [HIGH] `landing_cost_price` is always NULL **File:** `inventory-server/scripts/import/products.js` (line ~175) **Description:** The import explicitly sets `landing_cost_price = NULL` for all products. The daily_snapshots COGS calculation uses it as a fallback: `COALESCE(o.costeach, get_weighted_avg_cost(...), p.landing_cost_price, p.cost_price)`. Since it's always NULL, this fallback step is useless and the chain jumps straight to `cost_price`. The `product_metrics` field `current_landing_cost_price` is populated as `COALESCE(p.landing_cost_price, p.cost_price, 0.00)`, so it equals `cost_price` for all products. Any UI showing "landing cost" is actually just showing `cost_price`. --- ### ISSUE 6: [HIGH] Vendor lead time is drastically wrong — missing supplier_id join **File:** `calculate_vendor_metrics.sql`, lines 62-82 **Confirmed by data:** Vendor-level lead times are 2-10x higher than product-level lead times **Description:** The vendor metrics lead time joins POs to receivings only by `pid`: ```sql LEFT JOIN public.receivings r ON r.pid = po.pid ``` But the periodic metrics lead time correctly matches supplier: ```sql JOIN public.receivings r ON r.pid = po.pid AND r.supplier_id = po.supplier_id ``` Without supplier matching, a PO for product X from Vendor A can match a receiving of product X from Vendor B, creating inflated/wrong lead times. **Measured discrepancies:** | Vendor | Vendor Metrics Lead Time | Avg Product Lead Time | |---|---|---| | doodlebug design inc. | 66 days | 14 days | | Notions | 55 days | 4 days | | Simple Stories | 59 days | 27 days | | Ranger Industries | 31 days | 5 days | --- ### ISSUE 7: [MEDIUM] Net revenue does not subtract returns **File:** `update_daily_snapshots.sql`, line 184 **Description:** `net_revenue = gross_revenue - discounts`. Standard accounting: `net_revenue = gross_revenue - discounts - returns`. The `returns_revenue` is calculated separately but not deducted. **Data impact:** There are 3,352 orders with negative quantities (returns), totaling -5,499 units. These returns are tracked in `returns_revenue` but not reflected in `net_revenue`, which means all downstream revenue-based metrics are slightly overstated. --- ### ISSUE 8: [MEDIUM] Lifetime revenue subquery references wrong table columns **File:** `update_product_metrics.sql`, lines 323-329 **Description:** The lifetime revenue estimation fallback queries: ```sql SELECT revenue_7d / NULLIF(sales_7d, 0) FROM daily_product_snapshots WHERE pid = ci.pid AND sales_7d > 0 ``` But `daily_product_snapshots` does NOT have `revenue_7d` or `sales_7d` columns — those exist in `product_metrics`. This subquery either errors silently or returns NULL. The effect is that the estimation always falls back to `current_price * total_sold`. --- ### ISSUE 9: [MEDIUM] Brand/Vendor metrics COGS filter inflates margins **Files:** `calculate_brand_metrics.sql` lines 31, `calculate_vendor_metrics.sql` line 32 **Description:** `SUM(CASE WHEN pm.cogs_30d > 0 THEN pm.cogs_30d ELSE 0 END)` excludes products with zero COGS. But if a product has sales revenue and zero COGS (missing cost data), the brand/vendor totals will include the revenue but not the COGS, artificially inflating the margin. **Data context:** Brand metrics revenue matches product_metrics aggregation exactly for sales counts, but shows small discrepancies in revenue (e.g., Stamperia: $7,613.98 brand vs $7,611.11 actual). These tiny diffs come from the `> 0` filtering excluding products with negative revenue. --- ### ISSUE 10: [MEDIUM] Extreme margin values from $0.01 price orders **Confirmed by data:** 73 products with margin > 100%, 119 with margin < -100% **Examples:** | Product | Revenue | COGS | Margin | |---|---|---|---| | Flower Gift Box Die (pid 624756) | $0.02 | $29.98 | -149,800% | | Special Flowers Stamp Set (pid 614513) | $0.01 | $11.97 | -119,632% | These are products with extremely low prices (likely samples, promos, or data errors) where the order price was $0.01. The margin calculation is mathematically correct but these outliers skew any aggregate margin statistics. --- ### ISSUE 11: [MEDIUM] Sell-through rate has edge cases yielding negative/extreme values **File:** `update_product_metrics.sql`, lines 358-361 **Confirmed by data:** 30 products with negative sell-through, 10 with sell-through > 200% **Description:** Beginning inventory is approximated as `current_stock + sales - received + returns`. When inventory adjustments, shrinkage, or manual corrections occur, this approximation breaks. Edge cases: - Products with many manual stock adjustments → negative denominator → negative sell-through - Products with beginning stock near zero but decent sales → sell-through > 100% --- ### ISSUE 12: [MEDIUM] `total_sold` uses different status filter than orders import **Import pipeline confirmed:** - Orders import: `order_status >= 15` (includes processing/pending orders) - `total_sold` in products: `order_status >= 20` (more restrictive) This means `lifetime_sales` (from `total_sold`) is systematically lower than what you'd calculate by summing the orders table. The discrepancy is confirmed: | Product | total_sold | orders sum | Gap | |---|---|---|---| | pid 31286 | 13,786 | 4,241 | 9,545 | | pid 44309 | 11,978 | 3,119 | 8,859 | The large gaps are because the orders table only has data from the import start date (~2024), while `total_sold` includes all-time sales from MySQL. This is expected behavior, not a bug, but it means the `lifetime_revenue_quality` flag is important — most products show 'estimated' quality. --- ### ISSUE 13: [MEDIUM] Category rollup may double-count products in multiple hierarchy levels **File:** `calculate_category_metrics.sql`, lines 42-66 **Description:** The `RolledUpMetrics` CTE uses: ```sql dcm.cat_id = ch.cat_id OR dcm.cat_id = ANY(SELECT cat_id FROM category_hierarchy WHERE ch.cat_id = ANY(ancestor_ids)) ``` If products are assigned to categories at multiple levels in the same branch (e.g., both "Paper Crafts" and "Scrapbook Paper" which is a child of "Paper Crafts"), those products' metrics would be counted twice in the parent's rollup. --- ### ISSUE 14: [LOW] `exclude_forecast` removes products from metrics entirely **File:** `update_product_metrics.sql`, line 509 **Description:** `WHERE s.exclude_forecast IS FALSE OR s.exclude_forecast IS NULL` is on the main INSERT's WHERE clause. Products with `exclude_forecast = TRUE` won't appear in `product_metrics` at all, rather than just having forecast fields nulled. Currently all 681,912 products are in product_metrics so this appears to not affect any products yet. --- ### ISSUE 15: [LOW] Daily snapshots only look back 5 days **File:** `update_daily_snapshots.sql`, line 14 — `_process_days INT := 5` If import data arrives late (>5 days), those days will never get snapshots populated. There is a separate `backfill/rebuild_daily_snapshots.sql` for historical rebuilds. --- ### ISSUE 16: [INFO] Timezone risk in order date import **File:** `inventory-server/scripts/import/orders.js` MySQL `DATETIME` values are timezone-naive. The import uses `new Date(order.date)` which interprets them using the import server's local timezone. The SSH config specifies `timezone: '-05:00'` for MySQL (always EST). If the import server is in a different timezone, orders near midnight could land on the wrong date in the daily snapshots calculation. --- ## Custom Functions Review ### `calculate_sales_velocity(sales_30d, stockout_days_30d)` - Divides `sales_30d` by effective selling days: `GREATEST(30 - stockout_days, CASE WHEN sales > 0 THEN 14 ELSE 30 END)` - The 14-day floor prevents extreme velocity for products mostly out of stock - **Sound approach** — the only concern is that stockout_days is unreliable (Issues 2, 3) ### `get_weighted_avg_cost(pid, date)` - Weighted average of last 10 receivings by cost*qty/qty - Returns NULL if no receivings — sound fallback behavior - **Correct implementation** ### `safe_divide(numerator, denominator)` - Returns NULL on divide-by-zero — **correct** ### `std_numeric(value, precision)` - Rounds to precision digits — **correct** ### `classify_demand_pattern(avg_demand, cv)` - Uses coefficient of variation thresholds: ≤0.2 = stable, ≤0.5 = variable, low-volume+high-CV = sporadic, else lumpy - **Reasonable classification**, though only based on 30-day window ### `detect_seasonal_pattern(pid)` - CROSS JOIN LATERAL (runs per product) — **expensive**: queries `daily_product_snapshots` twice per product - Compares current month average to yearly average — very simplistic - **Functional but could be a performance bottleneck** with 681K products ### `category_hierarchy` (materialized view) - Recursive CTE building tree from categories — **correct implementation** - Refreshed concurrently before category metrics calculation — **good practice** --- ## Data Health Summary | Metric | Count | % of Total | |---|---|---| | Products with zero cost_price | 385,545 | 56.5% | | Products with NULL sales_30d | 621,221 | 91.1% | | Products with no lifetime_sales | 321,321 | 47.1% | | Products with zero COGS but positive sales | 27 | <0.01% | | Products with margin > 100% | 73 | <0.01% | | Products with margin < -100% | 119 | <0.01% | | Products with negative sell-through | 30 | <0.01% | | Products with NULL status | 0 | 0% | | Duplicate daily snapshots (same pid+date) | 0 | 0% | | Net revenue formula mismatches | 0 | 0% | ### ABC Classification Distribution (replenishable products only) | Class | Products | Revenue % | |---|---|---| | A | 7,727 | 80.72% | | B | 12,048 | 15.10% | | C | 113,647 | 4.18% | ABC distribution looks healthy — A ≈ 80%, A+B ≈ 96%. ### Brand Metrics Consistency Product counts and sales_30d match exactly between `brand_metrics` and direct aggregation from `product_metrics`. Revenue shows sub-dollar discrepancies due to the `> 0` filter excluding products with negative revenue. **Consistent within expected tolerance.** --- ## Priority Recommendations ### Must Fix (Correctness Issues) 1. **Issue 1: Fix order status handling** — The text-based filter (`NOT IN ('canceled', 'returned')`) is dead code against numeric statuses. Two options: (a) map numeric statuses to text during import (like POs already do), or (b) change SQL to filter on numeric codes (e.g., `o.status::int >= 20` to exclude cancelled/unfinished, or `o.status IN ('100', '95')` for shipped-only). The ~19.7K unfulfilled orders (0.69%) are a minor financial impact but the filter should be functional. 2. **Issue 6: Add supplier_id join to vendor lead time** — One-line fix in `calculate_vendor_metrics.sql` 3. **Issue 8: Fix lifetime revenue subquery** — Use correct column names from `daily_product_snapshots` (e.g., `net_revenue / NULLIF(units_sold, 0)`) ### Should Fix (Data Quality) 4. **Issue 2/3: Snapshot coverage** — Consider creating snapshot rows for all in-stock products, not just those with activity. Or at minimum, calculate stockout metrics by comparing snapshot existence to product existence. 5. **Issue 5: Populate landing_cost_price** — If available in the source system, import it. Otherwise remove references to avoid confusion. 6. **Issue 7: Subtract returns from net_revenue** — `net_revenue = gross_revenue - discounts - returns_revenue` 7. **Issue 9: Remove > 0 filter on COGS** — Use `SUM(pm.cogs_30d)` instead of conditional sums ### Nice to Fix (Edge Cases) 8. **Issue 4: Flag estimated costs** — Add a `costeach_estimated BOOLEAN` to orders during import 9. **Issue 10: Cap or flag extreme margins** — Exclude $0.01-price orders from margin calculations 10. **Issue 11: Clamp sell-through** — `GREATEST(0, LEAST(sell_through_30d, 200))` or flag outliers 11. **Issue 12: Verify category assignment policy** — Check if products are assigned to leaf categories only 12. **Issue 13: Category rollup query** — Verify no double-counting with actual data