16 KiB
Metrics Pipeline Audit Report
Date: 2026-02-08
Scope: All 6 SQL scripts in inventory-server/scripts/metrics-new/, import pipeline, custom functions, and post-calculation data verification.
Executive Summary
The metrics pipeline is architecturally sound and the core calculations are mostly correct. The 30-day sales, revenue, replenishment, and aggregate metrics (brand/vendor/category) all cross-check accurately between the snapshots, product_metrics, and direct orders queries. However, several issues were found ranging from critical data bugs to design limitations that affect accuracy of specific metrics.
Issues found: 13 (3 Critical, 4 Medium, 6 Low/Informational)
CRITICAL Issues
C1. net_revenue in daily snapshots never subtracts returns ($35.6K affected)
Location: update_daily_snapshots.sql, line 181
Symptom: net_revenue is stored as gross_revenue - discounts but should be gross_revenue - discounts - returns_revenue.
The SQL formula on line 181 appears correct:
COALESCE(sd.gross_revenue_unadjusted, 0.00) - COALESCE(sd.discounts, 0.00) - COALESCE(sd.returns_revenue, 0.00) AS net_revenue
However, actual data shows net_revenue = gross_revenue - discounts for ALL 3,252 snapshots that have returns. Total returns not subtracted: $35,630.03 across 2,946 products. This may be caused by the returns_revenue in the SalesData CTE not properly flowing through to the INSERT, or by a prior version of the code that stored these values differently. The profit column (line 184) has the same issue: (gross - discounts) - cogs instead of (gross - discounts - returns) - cogs.
Impact: Net revenue and profit are overstated by the amount of returns. This cascades to all metrics derived from snapshots: revenue_30d, profit_30d, margin_30d, avg_ros_30d, and all brand/vendor/category aggregate revenue.
Recommended fix: Debug why the returns subtraction isn't taking effect. The formula in the SQL looks correct, so this may be a data-type issue or an execution path issue. After fixing, rebuild snapshots.
Status: Owner will resolve. Code formula is correct; snapshots need rebuilding after prior fix deployment.
C2. eod_stock_quantity uses CURRENT stock, not historical end-of-day stock
Location: update_daily_snapshots.sql, lines 123-132 (CurrentStock CTE)
Symptom: Every snapshot for a given product shows the same stock quantity regardless of the snapshot date.
The CurrentStock CTE simply reads stock_quantity from the products table:
SELECT pid, stock_quantity, ... FROM public.products
This means a snapshot from January 10 shows the SAME stock as today (February 8). Verified in data:
- Product 662561: stock = 36 on every date (Feb 1-7)
- Product 665397: stock = 25 on every date (Feb 1-7)
- All products checked show identical stock across all snapshot dates
Impact: All stock-derived metrics are inaccurate for historical analysis:
eod_stock_cost,eod_stock_retail,eod_stock_gross(all wrong for past dates)stockout_flag(based on current stock, not historical)stockout_days_30d(undercounted since stockout_flag uses current stock)avg_stock_units_30d,avg_stock_cost_30d(no variance, just current stock repeated)gmroi_30d,stockturn_30d(based on avg_stock which is flat)sell_through_30d(denominator uses current stock assumption)service_level_30d,fill_rate_30d
This is a known architectural limitation noted in MEMORY.md. Fixing requires either:
- Storing stock snapshots separately at end-of-day (ideally via a cron job that records stock before any changes)
- Reconstructing historical stock from orders and receivings (complex but possible)
Status: FIXED. MySQL's snap_product_value table (daily EOD stock per product since 2012) is now imported into PostgreSQL stock_snapshots table via scripts/import/stock-snapshots.js. The CurrentStock CTE in update_daily_snapshots.sql now uses LEFT JOIN stock_snapshots for historical stock, falling back to products.stock_quantity when no historical data exists. Requires: run import, then rebuild daily snapshots.
C3. ON CONFLICT DO UPDATE WHERE check skips 91%+ of product_metrics updates
Location: update_product_metrics.sql, lines 558-574
Symptom: 623,205 of 681,912 products (91.4%) have last_calculated older than 1 day. 592,369 are over 30 days old. 914 products with active 30-day sales haven't been updated in over 7 days.
The upsert's WHERE clause only updates if specific fields changed:
WHERE product_metrics.current_stock IS DISTINCT FROM EXCLUDED.current_stock OR
product_metrics.current_price IS DISTINCT FROM EXCLUDED.current_price OR ...
Fields NOT checked include: stockout_days_30d, margin_30d, gmroi_30d, demand_pattern, seasonality_index, sales_growth_*, service_level_30d, and many others. If a product's stock, price, sales, and revenue haven't changed, the entire row is skipped even though growth metrics, variability, and other derived fields may need updating.
Impact: Most derived metrics (growth, demand patterns, seasonality) are stale for the majority of products. Products with steady sales but unchanged stock/price never get their growth metrics recalculated.
Recommended fix: Either:
- Remove the
WHEREclause entirely (accept the performance cost of writing all rows every run) - Add
last_calculatedage check:OR product_metrics.last_calculated < NOW() - INTERVAL '7 days' - Add the missing fields to the change-detection check
Status: FIXED. Added 12 derived fields to the IS DISTINCT FROM check (profit_30d, cogs_30d, margin_30d, stockout_days_30d, sell_through_30d, sales_growth_30d_vs_prev, revenue_growth_30d_vs_prev, demand_pattern, seasonal_pattern, seasonality_index, service_level_30d, fill_rate_30d) plus a time-based safety net: OR product_metrics.last_calculated < NOW() - INTERVAL '1 day'. This guarantees every row is refreshed at least daily.
MEDIUM Issues
M1. Demand variability calculated only over activity days, not full 30-day window
Location: update_product_metrics.sql, DemandVariability CTE (lines 206-223)
Symptom: Variance, std_dev, and CV are computed over only the days that appear in snapshots (activity days), not the full 30-day period including zero-sales days.
Example: Product 41141 (Mexican Poppy) sold 102 units in 30 days across only 3 snapshot days (1, 1, 100). The variance/CV is calculated over just those 3 data points instead of 30 (with 27 zero-sales days).
Impact:
- CV is computed on sparse data (3-10 points instead of 30), making it statistically unreliable
- Products with sporadic large orders appear less variable than they really are
demand_patternclassification is affected (stable/variable/sporadic/lumpy)
Recommended fix: Join against a generated 30-day date series and COALESCE missing days to 0 units sold before computing variance/stddev/CV.
Status: FIXED. Rewrote DemandVariability CTE to use generate_series() for the full 30-day date range, CROSS JOIN with distinct PIDs from snapshots, and LEFT JOIN actual snapshot data with COALESCE(dps.units_sold, 0) for missing days. Variance/stddev/CV now computed over all 30 data points.
M2. costeach fallback to price * 0.5 affects 32.5% of recent orders
Location: orders.js, line 600 and 634
Symptom: When no cost record exists in order_costs, the import falls back to price * 0.5.
Data shows 9,839 of 30,266 recent orders (32.5%) use this fallback. Among these, 79 paid products have costeach = 0 because price = 0 * 0.5 = 0, even though the product has a real cost_price.
The daily snapshot has a second line of defense (using get_weighted_avg_cost() and then p.cost_price), but the orders table's costeach column itself contains inaccurate data for ~1/3 of orders.
Impact: COGS calculations at the order level are approximate for 1/3 of orders. The snapshot's fallback chain mitigates this somewhat, but any analytics using orders.costeach directly will be affected.
Status: FIXED. Added products.cost_price as intermediate fallback: COALESCE(oc.costeach, p.cost_price, oi.price * 0.5). The products table join was added to both the order_totals CTE and the outer SELECT in orders.js. Requires a full orders re-import to apply retroactively.
M3. lifetime_sales uses MySQL total_sold (status >= 20) but orders import uses status >= 15
Location: products.js line 200 vs orders.js line 69
Symptom: total_sold in the products table comes from MySQL with order_status >= 20, excluding status 15 (canceled) and 16 (combined). But the orders import fetches orders with order_status >= 15.
Verified in MySQL: For product 31286, total_sold (>=20) = 13,786 vs (>=15) = 13,905 (difference of 119 units).
Impact: lifetime_sales in product_metrics (sourced from products.total_sold) slightly understates compared to what the orders table contains. The lifetime_revenue_quality field correctly flags most as "estimated" since the orders table only covers ~5 years while total_sold is all-time. This is a minor inconsistency (< 1% difference).
Status: Accepted. < 1% difference, not worth the complexity of aligning thresholds.
M4. sell_through_30d has 868 NULL values and 547 anomalous values for products with sales
Location: update_product_metrics.sql, lines 356-361
Formula: (sales_30d / (current_stock + sales_30d + returns_units_30d - received_qty_30d)) * 100
- 868 products with sales but NULL sell_through (denominator = 0, which happens when
current_stock + sales - received = 0, i.e. all stock came from receiving and was sold) - 259 products with sell_through > 100%
- 288 products with negative sell_through
Impact: Sell-through rate is unreliable for products with significant receiving activity in the same period. The formula tries to approximate "beginning inventory" but the approximation breaks when current stock ≠ actual beginning stock (which is always, per issue C2).
Status: Will improve once C2 fix (historical stock) is deployed and snapshots are rebuilt, since current_stock in the formula will then reflect actual beginning inventory.
LOW / INFORMATIONAL Issues
L1. Snapshots only cover ~1,167 products/day out of 681K
Only products with order or receiving activity on a given day get snapshots. This is by design (the ProductsWithActivity CTE on line 133 of update_daily_snapshots.sql), but it means:
- 560K+ products have zero snapshot history
- Stockout tracking is impossible for products with no sales (they can't appear in snapshots)
- The "avg_stock" metrics (avg_stock_units_30d, etc.) only average over activity days, not all 30 days
This is acceptable for storage efficiency but should be understood when interpreting metrics.
Status: Accepted (by design).
L2. detect_seasonal_pattern function only compares current month to yearly average
The seasonality detection is simplistic: it compares current month's avg daily sales to yearly avg. This means:
- It can only detect if the CURRENT month is above average, not identify historical seasonal patterns
- Running in January vs July will give completely different results for the same product
- The "peak_season" field always shows the current month/quarter when seasonal (not the actual peak)
This is noted as a P5 (low priority) feature and is adequate for a first pass but should not be relied upon for demand planning.
Status: FIXED. Rewrote detect_seasonal_pattern function to compare monthly average sales across the full last 12 months. Uses CV across months + peak-to-average ratio for classification: strong (CV > 0.5, peak > 150%), moderate (CV > 0.3, peak > 120%), none. Peak season now identifies the actual highest-sales month. Requires at least 3 months of data. Saved in db/functions.sql.
L3. Free product with negative revenue in top sellers
Product 476848 ("Thank You, From ACOT!") shows 254 sales with -$1.00 revenue because one order applied a $1 discount to a $0 product. This is a data oddity, not a calculation bug. Could be addressed by excluding $0-price products from revenue metrics or by data cleanup.
Status: Accepted (data oddity, not a bug).
L4. landing_cost_price is always NULL
current_landing_cost_price in product_metrics is mapped from current_effective_cost which is just cost_price. The landing_cost_price concept (cost + shipping + duties) is not implemented. The field exists but has no meaningful data.
Status: FIXED. Removed landing_cost_price from db/schema.sql, current_landing_cost_price from db/metrics-schema-new.sql, update_product_metrics.sql, and backfill/populate_initial_product_metrics.sql. Column should be dropped from the live database via ALTER TABLE.
L5. Custom SQL functions not tracked in version control
All 6 custom functions (calculate_sales_velocity, get_weighted_avg_cost, safe_divide, std_numeric, classify_demand_pattern, detect_seasonal_pattern) and the category_hierarchy materialized view exist only in the database. They are not defined in any migration or schema file in the repository.
If the database needs to be recreated, these would be lost.
Status: FIXED. All 6 functions and the category_hierarchy materialized view definition saved to inventory-server/db/functions.sql. File is re-runnable via psql -f functions.sql.
L6. get_weighted_avg_cost limited to last 10 receivings
The function LIMIT 10 for performance, but this means products with many small receivings may not accurately reflect the true weighted average cost if the cost has changed significantly beyond the last 10 receiving records.
Status: FIXED. Removed LIMIT 10 from get_weighted_avg_cost. Data shows max receivings per product is 142 (p95 = 11, avg = 3), so performance impact is negligible. Updated definition in db/functions.sql.
Verification Summary
What's Working Correctly
| Check | Result |
|---|---|
| 30d sales: product_metrics vs orders vs snapshots | MATCH (verified top 10 sellers) |
| Replenishment formula: manual calc vs stored | MATCH (verified 10 products) |
| Brand metrics vs sum of product_metrics | MATCH (0 difference across all brands) |
| Order status mapping (numeric → text) | CORRECT (all statuses mapped, no numeric remain) |
| Cost price: PostgreSQL vs MySQL source | MATCH (within rounding, verified 5 products) |
| total_sold: PostgreSQL vs MySQL source | MATCH (verified 5 products) |
| Category rollups (rolled-up > direct for parents) | CORRECT |
| ABC classification distribution | REASONABLE (A: 8K, B: 12.5K, C: 113K) |
| Lead time calculation (PO → receiving) | CORRECT (verified examples) |
Data Overview
| Metric | Value |
|---|---|
| Total products | 681,912 |
| Products in product_metrics | 681,912 (100%) |
| Products with 30d sales | 10,291 (1.5%) |
| Products with negative profit & revenue | 139 (mostly cost > price) |
| Products with negative stock | 0 |
| Snapshot date range | 2020-06-18 to 2026-02-08 |
| Avg products per snapshot day | 1,167 |
| Order date range | 2020-06-18 to 2026-02-08 |
| Total orders | 2,885,825 |
| 'returned' status orders | 0 (returns via negative quantity only) |
Fix Status Summary
| Issue | Severity | Status | Deployment Action Needed |
|---|---|---|---|
| C1 | Critical | Owner resolving | Rebuild daily snapshots |
| C2 | Critical | FIXED | Run import, rebuild daily snapshots |
| C3 | Critical | FIXED | Deploy updated update_product_metrics.sql |
| M1 | Medium | FIXED | Deploy updated update_product_metrics.sql |
| M2 | Medium | FIXED | Full orders re-import (--full) |
| M3 | Medium | Accepted | None |
| M4 | Medium | Pending C2 | Will improve after C2 deployment |
| L1 | Low | Accepted | None |
| L2 | Low | FIXED | Deploy db/functions.sql to database |
| L3 | Low | Accepted | None |
| L4 | Low | FIXED | ALTER TABLE to drop columns |
| L5 | Low | FIXED | None (file committed) |
| L6 | Low | FIXED | Deploy db/functions.sql to database |
Deployment Steps
- Deploy
db/functions.sqlto PostgreSQL:psql -d inventory_db -f db/functions.sql(L2, L6) - Run import (includes stock snapshots first load) (C2, M2)
- Drop stale columns:
ALTER TABLE products DROP COLUMN IF EXISTS landing_cost_price; ALTER TABLE product_metrics DROP COLUMN IF EXISTS current_landing_cost_price;(L4) - Rebuild daily snapshots (C1, C2)
- Re-run metrics calculation (C3, M1 take effect automatically)