Restore accidentally removed files, a few additional import/calculation fixes

This commit is contained in:
2026-02-09 10:19:35 -05:00
parent 6aefc1b40d
commit 38b12c188f
209 changed files with 69925 additions and 412 deletions

346
docs/METRICS_AUDIT.md Normal file
View File

@@ -0,0 +1,346 @@
# Metrics Calculation Pipeline Audit
**Date:** 2026-02-07
**Scope:** All 6 SQL calculation scripts, custom DB functions, import pipeline, and live data verification
## Overview
The metrics pipeline in `inventory-server/scripts/calculate-metrics-new.js` runs 6 SQL scripts sequentially:
1. `update_daily_snapshots.sql` — Aggregates daily per-product sales/receiving data
2. `update_product_metrics.sql` — Calculates the main product_metrics table (KPIs, forecasting, status)
3. `update_periodic_metrics.sql` — ABC classification, average lead time
4. `calculate_brand_metrics.sql` — Brand-level aggregated metrics
5. `calculate_vendor_metrics.sql` — Vendor-level aggregated metrics
6. `calculate_category_metrics.sql` — Category-level metrics with hierarchy rollups
### Database Scale
| Table | Row Count |
|---|---|
| products | 681,912 |
| orders | 2,883,982 |
| purchase_orders | 256,809 |
| receivings | 313,036 |
| daily_product_snapshots | 678,312 (601 distinct dates, since 2024-06-01) |
| product_metrics | 681,912 |
| brand_metrics | 1,789 |
| vendor_metrics | 281 |
| category_metrics | 610 |
---
## Issues Found
### ISSUE 1: [HIGH] Order status filter is non-functional — numeric codes vs text comparison
**Files:** `update_daily_snapshots.sql` lines 86-101, `update_product_metrics.sql` lines 89, 178-183
**Confirmed by data:** All order statuses are numeric strings ('100', '50', '55', etc.)
**Status mappings from:** `docs/prod_registry.class.php`
**Description:** The SQL filters `COALESCE(o.status, 'pending') NOT IN ('canceled', 'returned')` and `o.status NOT IN ('canceled', 'returned')` are used throughout the pipeline to exclude canceled/returned orders. However, the import pipeline stores order statuses as their **raw numeric codes** from the production MySQL database (e.g., '100', '50', '55', '90', '92'). There are **zero text status values** in the orders table.
This means these filters **never exclude any rows** — every comparison is `'100' NOT IN ('canceled', 'returned')` which is always true.
**Actual status distribution (with confirmed meanings):**
| Status | Meaning | Count | Negative Qty | Assessment |
|---|---|---|---|---|
| 100 | shipped | 2,862,792 | 3,352 | Completed — correct to include |
| 50 | awaiting_products | 11,109 | 0 | In-progress — not yet shipped |
| 55 | shipping_later | 5,689 | 0 | In-progress — not yet shipped |
| 56 | shipping_together | 2,863 | 0 | In-progress — not yet shipped |
| 90 | awaiting_shipment | 38 | 0 | Near-complete — not yet shipped |
| 92 | awaiting_pickup | 71 | 0 | Near-complete — awaiting customer |
| 95 | shipped_confirmed | 5 | 0 | Completed — correct to include |
| 15 | cancelled | 1 | 0 | Should be excluded |
**Full status reference (from prod_registry.class.php):**
- 0=created, 10=unfinished, **15=cancelled**, 16=combined, 20=placed, 22=placed_incomplete
- 30=cancelled_old (historical), 40=awaiting_payment, 50=awaiting_products
- 55=shipping_later, 56=shipping_together, 60=ready, 61=flagged
- 62=fix_before_pick, 65=manual_picking, 70=in_pt, 80=picked
- 90=awaiting_shipment, 91=remote_wait, **92=awaiting_pickup**, 93=fix_before_ship
- **95=shipped_confirmed**, **100=shipped**
**Severity revised to HIGH (from CRITICAL):** Now that we know the actual meanings, no cancelled/refunded orders are being miscounted (only 1 cancelled order exists, status=15). The real concern is twofold:
1. **The text-based filter is dead code** — it can never match any row. Either map statuses to text during import (like POs do) or change SQL to use numeric comparisons.
2. **~19,775 unfulfilled orders** (statuses 50/55/56/90/92) are counted as completed sales. These are orders in various stages of fulfillment that haven't shipped yet. While most will eventually ship, counting them now inflates current-period metrics. At 0.69% of total orders, the financial impact is modest but the filter should work correctly on principle.
**Note:** PO statuses ARE properly mapped to text ('canceled', 'done', etc.) in the import pipeline. Only order statuses are numeric.
---
### ISSUE 2: [CRITICAL] Daily Snapshots use current stock instead of historical EOD stock
**File:** `update_daily_snapshots.sql`, lines 126-135, 173
**Confirmed by data:** Top product (pid 666925) shows `eod_stock_quantity = 0` for ALL dates even though it sold 28 units on Jan 28 (clearly had stock then)
**Description:** The `CurrentStock` CTE reads `stock_quantity` directly from the `products` table at query execution time. When the script processes historical dates (today minus 1-4 days), it writes **today's stock** as if it were the end-of-day stock for those past dates.
**Cascading impact on product_metrics:**
- `avg_stock_units_30d` / `avg_stock_cost_30d` — Wrong averages
- `stockout_days_30d` — Undercounts (only based on current stock state, not historical)
- `stockout_rate_30d`, `service_level_30d`, `fill_rate_30d` — All derived from wrong stockout data
- `gmroi_30d` — Wrong denominator (avg stock cost)
- `stockturn_30d` — Wrong denominator (avg stock units)
- `sell_through_30d` — Affected by stock level inaccuracy
---
### ISSUE 3: [CRITICAL] Snapshot coverage is 0.17% — most products have no snapshot data
**Confirmed by data:** 678,312 snapshot rows across 601 dates = ~1,128 products/day out of 681,912 total
**Description:** The daily snapshots script only creates rows for products with sales or receiving activity on that date (`ProductsWithActivity` CTE, line 136). This means:
- 91.1% of products (621,221) have NULL `sales_30d` — they had no orders in the last 30 days so no snapshot rows exist
- `AVG(eod_stock_quantity)` averages only across days with activity, not 30 days
- `stockout_days_30d` only counts stockout days where there was ALSO some activity
- A product out of stock with zero sales gets zero stockout_days even though it was stocked out
This is by design (to avoid creating 681K rows/day) but means stock-related metrics are systematically biased.
---
### ISSUE 4: [HIGH] `costeach` fallback to 50% of price in import pipeline
**File:** `inventory-server/scripts/import/orders.js` (line ~573)
**Description:** When the MySQL `order_costs` table has no record for an order item, `costeach` defaults to `price * 0.5`. There is **no flag** in the PostgreSQL data to distinguish actual costs from estimated ones.
**Data impact:** 385,545 products (56.5%) have `current_cost_price = 0` AND `current_landing_cost_price = 0`. For these products, the COGS calculation in daily_snapshots falls through the chain:
1. `o.costeach` — May be the 50% estimate from import
2. `get_weighted_avg_cost()` — Returns NULL if no receivings exist
3. `p.landing_cost_price` — Always NULL (hardcoded in import)
4. `p.cost_price` — 0 for 56.5% of products
Only 27 products have zero COGS with positive sales, meaning the `costeach` field is doing its job for products that sell, but the 50% fallback means margins for those products are estimates, not actuals.
---
### ISSUE 5: [HIGH] `landing_cost_price` is always NULL
**File:** `inventory-server/scripts/import/products.js` (line ~175)
**Description:** The import explicitly sets `landing_cost_price = NULL` for all products. The daily_snapshots COGS calculation uses it as a fallback: `COALESCE(o.costeach, get_weighted_avg_cost(...), p.landing_cost_price, p.cost_price)`. Since it's always NULL, this fallback step is useless and the chain jumps straight to `cost_price`.
The `product_metrics` field `current_landing_cost_price` is populated as `COALESCE(p.landing_cost_price, p.cost_price, 0.00)`, so it equals `cost_price` for all products. Any UI showing "landing cost" is actually just showing `cost_price`.
---
### ISSUE 6: [HIGH] Vendor lead time is drastically wrong — missing supplier_id join
**File:** `calculate_vendor_metrics.sql`, lines 62-82
**Confirmed by data:** Vendor-level lead times are 2-10x higher than product-level lead times
**Description:** The vendor metrics lead time joins POs to receivings only by `pid`:
```sql
LEFT JOIN public.receivings r ON r.pid = po.pid
```
But the periodic metrics lead time correctly matches supplier:
```sql
JOIN public.receivings r ON r.pid = po.pid AND r.supplier_id = po.supplier_id
```
Without supplier matching, a PO for product X from Vendor A can match a receiving of product X from Vendor B, creating inflated/wrong lead times.
**Measured discrepancies:**
| Vendor | Vendor Metrics Lead Time | Avg Product Lead Time |
|---|---|---|
| doodlebug design inc. | 66 days | 14 days |
| Notions | 55 days | 4 days |
| Simple Stories | 59 days | 27 days |
| Ranger Industries | 31 days | 5 days |
---
### ISSUE 7: [MEDIUM] Net revenue does not subtract returns
**File:** `update_daily_snapshots.sql`, line 184
**Description:** `net_revenue = gross_revenue - discounts`. Standard accounting: `net_revenue = gross_revenue - discounts - returns`. The `returns_revenue` is calculated separately but not deducted.
**Data impact:** There are 3,352 orders with negative quantities (returns), totaling -5,499 units. These returns are tracked in `returns_revenue` but not reflected in `net_revenue`, which means all downstream revenue-based metrics are slightly overstated.
---
### ISSUE 8: [MEDIUM] Lifetime revenue subquery references wrong table columns
**File:** `update_product_metrics.sql`, lines 323-329
**Description:** The lifetime revenue estimation fallback queries:
```sql
SELECT revenue_7d / NULLIF(sales_7d, 0)
FROM daily_product_snapshots
WHERE pid = ci.pid AND sales_7d > 0
```
But `daily_product_snapshots` does NOT have `revenue_7d` or `sales_7d` columns — those exist in `product_metrics`. This subquery either errors silently or returns NULL. The effect is that the estimation always falls back to `current_price * total_sold`.
---
### ISSUE 9: [MEDIUM] Brand/Vendor metrics COGS filter inflates margins
**Files:** `calculate_brand_metrics.sql` lines 31, `calculate_vendor_metrics.sql` line 32
**Description:** `SUM(CASE WHEN pm.cogs_30d > 0 THEN pm.cogs_30d ELSE 0 END)` excludes products with zero COGS. But if a product has sales revenue and zero COGS (missing cost data), the brand/vendor totals will include the revenue but not the COGS, artificially inflating the margin.
**Data context:** Brand metrics revenue matches product_metrics aggregation exactly for sales counts, but shows small discrepancies in revenue (e.g., Stamperia: $7,613.98 brand vs $7,611.11 actual). These tiny diffs come from the `> 0` filtering excluding products with negative revenue.
---
### ISSUE 10: [MEDIUM] Extreme margin values from $0.01 price orders
**Confirmed by data:** 73 products with margin > 100%, 119 with margin < -100%
**Examples:**
| Product | Revenue | COGS | Margin |
|---|---|---|---|
| Flower Gift Box Die (pid 624756) | $0.02 | $29.98 | -149,800% |
| Special Flowers Stamp Set (pid 614513) | $0.01 | $11.97 | -119,632% |
These are products with extremely low prices (likely samples, promos, or data errors) where the order price was $0.01. The margin calculation is mathematically correct but these outliers skew any aggregate margin statistics.
---
### ISSUE 11: [MEDIUM] Sell-through rate has edge cases yielding negative/extreme values
**File:** `update_product_metrics.sql`, lines 358-361
**Confirmed by data:** 30 products with negative sell-through, 10 with sell-through > 200%
**Description:** Beginning inventory is approximated as `current_stock + sales - received + returns`. When inventory adjustments, shrinkage, or manual corrections occur, this approximation breaks. Edge cases:
- Products with many manual stock adjustments → negative denominator → negative sell-through
- Products with beginning stock near zero but decent sales → sell-through > 100%
---
### ISSUE 12: [MEDIUM] `total_sold` uses different status filter than orders import
**Import pipeline confirmed:**
- Orders import: `order_status >= 15` (includes processing/pending orders)
- `total_sold` in products: `order_status >= 20` (more restrictive)
This means `lifetime_sales` (from `total_sold`) is systematically lower than what you'd calculate by summing the orders table. The discrepancy is confirmed:
| Product | total_sold | orders sum | Gap |
|---|---|---|---|
| pid 31286 | 13,786 | 4,241 | 9,545 |
| pid 44309 | 11,978 | 3,119 | 8,859 |
The large gaps are because the orders table only has data from the import start date (~2024), while `total_sold` includes all-time sales from MySQL. This is expected behavior, not a bug, but it means the `lifetime_revenue_quality` flag is important — most products show 'estimated' quality.
---
### ISSUE 13: [MEDIUM] Category rollup may double-count products in multiple hierarchy levels
**File:** `calculate_category_metrics.sql`, lines 42-66
**Description:** The `RolledUpMetrics` CTE uses:
```sql
dcm.cat_id = ch.cat_id OR dcm.cat_id = ANY(SELECT cat_id FROM category_hierarchy WHERE ch.cat_id = ANY(ancestor_ids))
```
If products are assigned to categories at multiple levels in the same branch (e.g., both "Paper Crafts" and "Scrapbook Paper" which is a child of "Paper Crafts"), those products' metrics would be counted twice in the parent's rollup.
---
### ISSUE 14: [LOW] `exclude_forecast` removes products from metrics entirely
**File:** `update_product_metrics.sql`, line 509
**Description:** `WHERE s.exclude_forecast IS FALSE OR s.exclude_forecast IS NULL` is on the main INSERT's WHERE clause. Products with `exclude_forecast = TRUE` won't appear in `product_metrics` at all, rather than just having forecast fields nulled. Currently all 681,912 products are in product_metrics so this appears to not affect any products yet.
---
### ISSUE 15: [LOW] Daily snapshots only look back 5 days
**File:** `update_daily_snapshots.sql`, line 14 — `_process_days INT := 5`
If import data arrives late (>5 days), those days will never get snapshots populated. There is a separate `backfill/rebuild_daily_snapshots.sql` for historical rebuilds.
---
### ISSUE 16: [INFO] Timezone risk in order date import
**File:** `inventory-server/scripts/import/orders.js`
MySQL `DATETIME` values are timezone-naive. The import uses `new Date(order.date)` which interprets them using the import server's local timezone. The SSH config specifies `timezone: '-05:00'` for MySQL (always EST). If the import server is in a different timezone, orders near midnight could land on the wrong date in the daily snapshots calculation.
---
## Custom Functions Review
### `calculate_sales_velocity(sales_30d, stockout_days_30d)`
- Divides `sales_30d` by effective selling days: `GREATEST(30 - stockout_days, CASE WHEN sales > 0 THEN 14 ELSE 30 END)`
- The 14-day floor prevents extreme velocity for products mostly out of stock
- **Sound approach** — the only concern is that stockout_days is unreliable (Issues 2, 3)
### `get_weighted_avg_cost(pid, date)`
- Weighted average of last 10 receivings by cost*qty/qty
- Returns NULL if no receivings — sound fallback behavior
- **Correct implementation**
### `safe_divide(numerator, denominator)`
- Returns NULL on divide-by-zero — **correct**
### `std_numeric(value, precision)`
- Rounds to precision digits — **correct**
### `classify_demand_pattern(avg_demand, cv)`
- Uses coefficient of variation thresholds: ≤0.2 = stable, ≤0.5 = variable, low-volume+high-CV = sporadic, else lumpy
- **Reasonable classification**, though only based on 30-day window
### `detect_seasonal_pattern(pid)`
- CROSS JOIN LATERAL (runs per product) — **expensive**: queries `daily_product_snapshots` twice per product
- Compares current month average to yearly average — very simplistic
- **Functional but could be a performance bottleneck** with 681K products
### `category_hierarchy` (materialized view)
- Recursive CTE building tree from categories — **correct implementation**
- Refreshed concurrently before category metrics calculation — **good practice**
---
## Data Health Summary
| Metric | Count | % of Total |
|---|---|---|
| Products with zero cost_price | 385,545 | 56.5% |
| Products with NULL sales_30d | 621,221 | 91.1% |
| Products with no lifetime_sales | 321,321 | 47.1% |
| Products with zero COGS but positive sales | 27 | <0.01% |
| Products with margin > 100% | 73 | <0.01% |
| Products with margin < -100% | 119 | <0.01% |
| Products with negative sell-through | 30 | <0.01% |
| Products with NULL status | 0 | 0% |
| Duplicate daily snapshots (same pid+date) | 0 | 0% |
| Net revenue formula mismatches | 0 | 0% |
### ABC Classification Distribution (replenishable products only)
| Class | Products | Revenue % |
|---|---|---|
| A | 7,727 | 80.72% |
| B | 12,048 | 15.10% |
| C | 113,647 | 4.18% |
ABC distribution looks healthy — A ≈ 80%, A+B ≈ 96%.
### Brand Metrics Consistency
Product counts and sales_30d match exactly between `brand_metrics` and direct aggregation from `product_metrics`. Revenue shows sub-dollar discrepancies due to the `> 0` filter excluding products with negative revenue. **Consistent within expected tolerance.**
---
## Priority Recommendations
### Must Fix (Correctness Issues)
1. **Issue 1: Fix order status handling** — The text-based filter (`NOT IN ('canceled', 'returned')`) is dead code against numeric statuses. Two options: (a) map numeric statuses to text during import (like POs already do), or (b) change SQL to filter on numeric codes (e.g., `o.status::int >= 20` to exclude cancelled/unfinished, or `o.status IN ('100', '95')` for shipped-only). The ~19.7K unfulfilled orders (0.69%) are a minor financial impact but the filter should be functional.
2. **Issue 6: Add supplier_id join to vendor lead time** — One-line fix in `calculate_vendor_metrics.sql`
3. **Issue 8: Fix lifetime revenue subquery** — Use correct column names from `daily_product_snapshots` (e.g., `net_revenue / NULLIF(units_sold, 0)`)
### Should Fix (Data Quality)
4. **Issue 2/3: Snapshot coverage** — Consider creating snapshot rows for all in-stock products, not just those with activity. Or at minimum, calculate stockout metrics by comparing snapshot existence to product existence.
5. **Issue 5: Populate landing_cost_price** — If available in the source system, import it. Otherwise remove references to avoid confusion.
6. **Issue 7: Subtract returns from net_revenue**`net_revenue = gross_revenue - discounts - returns_revenue`
7. **Issue 9: Remove > 0 filter on COGS** — Use `SUM(pm.cogs_30d)` instead of conditional sums
### Nice to Fix (Edge Cases)
8. **Issue 4: Flag estimated costs** — Add a `costeach_estimated BOOLEAN` to orders during import
9. **Issue 10: Cap or flag extreme margins** — Exclude $0.01-price orders from margin calculations
10. **Issue 11: Clamp sell-through**`GREATEST(0, LEAST(sell_through_30d, 200))` or flag outliers
11. **Issue 12: Verify category assignment policy** — Check if products are assigned to leaf categories only
12. **Issue 13: Category rollup query** — Verify no double-counting with actual data

276
docs/METRICS_AUDIT2.md Normal file
View File

@@ -0,0 +1,276 @@
# Metrics Pipeline Audit Report
**Date:** 2026-02-08
**Scope:** All 6 SQL scripts in `inventory-server/scripts/metrics-new/`, import pipeline, custom functions, and post-calculation data verification.
---
## Executive Summary
The metrics pipeline is architecturally sound and the core calculations are mostly correct. The 30-day sales, revenue, replenishment, and aggregate metrics (brand/vendor/category) all cross-check accurately between the snapshots, product_metrics, and direct orders queries. However, several issues were found ranging from **critical data bugs** to **design limitations** that affect accuracy of specific metrics.
**Issues found: 13** (3 Critical, 4 Medium, 6 Low/Informational)
---
## CRITICAL Issues
### C1. `net_revenue` in daily snapshots never subtracts returns ($35.6K affected)
**Location:** `update_daily_snapshots.sql`, line 181
**Symptom:** `net_revenue` is stored as `gross_revenue - discounts` but should be `gross_revenue - discounts - returns_revenue`.
The SQL formula on line 181 appears correct:
```sql
COALESCE(sd.gross_revenue_unadjusted, 0.00) - COALESCE(sd.discounts, 0.00) - COALESCE(sd.returns_revenue, 0.00) AS net_revenue
```
However, actual data shows `net_revenue = gross_revenue - discounts` for ALL 3,252 snapshots that have returns. Total returns not subtracted: **$35,630.03** across 2,946 products. This may be caused by the `returns_revenue` in the SalesData CTE not properly flowing through to the INSERT, or by a prior version of the code that stored these values differently. The profit column (line 184) has the same issue: `(gross - discounts) - cogs` instead of `(gross - discounts - returns) - cogs`.
**Impact:** Net revenue and profit are overstated by the amount of returns. This cascades to all metrics derived from snapshots: `revenue_30d`, `profit_30d`, `margin_30d`, `avg_ros_30d`, and all brand/vendor/category aggregate revenue.
**Recommended fix:** Debug why the returns subtraction isn't taking effect. The formula in the SQL looks correct, so this may be a data-type issue or an execution path issue. After fixing, rebuild snapshots.
**Status:** Owner will resolve. Code formula is correct; snapshots need rebuilding after prior fix deployment.
---
### C2. `eod_stock_quantity` uses CURRENT stock, not historical end-of-day stock
**Location:** `update_daily_snapshots.sql`, lines 123-132 (CurrentStock CTE)
**Symptom:** Every snapshot for a given product shows the same stock quantity regardless of the snapshot date.
The `CurrentStock` CTE simply reads `stock_quantity` from the `products` table:
```sql
SELECT pid, stock_quantity, ... FROM public.products
```
This means a snapshot from January 10 shows the SAME stock as today (February 8). Verified in data:
- Product 662561: stock = 36 on every date (Feb 1-7)
- Product 665397: stock = 25 on every date (Feb 1-7)
- All products checked show identical stock across all snapshot dates
**Impact:** All stock-derived metrics are inaccurate for historical analysis:
- `eod_stock_cost`, `eod_stock_retail`, `eod_stock_gross` (all wrong for past dates)
- `stockout_flag` (based on current stock, not historical)
- `stockout_days_30d` (undercounted since stockout_flag uses current stock)
- `avg_stock_units_30d`, `avg_stock_cost_30d` (no variance, just current stock repeated)
- `gmroi_30d`, `stockturn_30d` (based on avg_stock which is flat)
- `sell_through_30d` (denominator uses current stock assumption)
- `service_level_30d`, `fill_rate_30d`
**This is a known architectural limitation** noted in MEMORY.md. Fixing requires either:
1. Storing stock snapshots separately at end-of-day (ideally via a cron job that records stock before any changes)
2. Reconstructing historical stock from orders and receivings (complex but possible)
**Status: FIXED.** MySQL's `snap_product_value` table (daily EOD stock per product since 2012) is now imported into PostgreSQL `stock_snapshots` table via `scripts/import/stock-snapshots.js`. The `CurrentStock` CTE in `update_daily_snapshots.sql` now uses `LEFT JOIN stock_snapshots` for historical stock, falling back to `products.stock_quantity` when no historical data exists. Requires: run import, then rebuild daily snapshots.
---
### C3. `ON CONFLICT DO UPDATE WHERE` check skips 91%+ of product_metrics updates
**Location:** `update_product_metrics.sql`, lines 558-574
**Symptom:** 623,205 of 681,912 products (91.4%) have `last_calculated` older than 1 day. 592,369 are over 30 days old. 914 products with active 30-day sales haven't been updated in over 7 days.
The upsert's `WHERE` clause only updates if specific fields changed:
```sql
WHERE product_metrics.current_stock IS DISTINCT FROM EXCLUDED.current_stock OR
product_metrics.current_price IS DISTINCT FROM EXCLUDED.current_price OR ...
```
Fields NOT checked include: `stockout_days_30d`, `margin_30d`, `gmroi_30d`, `demand_pattern`, `seasonality_index`, `sales_growth_*`, `service_level_30d`, and many others. If a product's stock, price, sales, and revenue haven't changed, the entire row is skipped even though growth metrics, variability, and other derived fields may need updating.
**Impact:** Most derived metrics (growth, demand patterns, seasonality) are stale for the majority of products. Products with steady sales but unchanged stock/price never get their growth metrics recalculated.
**Recommended fix:** Either:
1. Remove the `WHERE` clause entirely (accept the performance cost of writing all rows every run)
2. Add `last_calculated` age check: `OR product_metrics.last_calculated < NOW() - INTERVAL '7 days'`
3. Add the missing fields to the change-detection check
**Status: FIXED.** Added 12 derived fields to the `IS DISTINCT FROM` check (`profit_30d`, `cogs_30d`, `margin_30d`, `stockout_days_30d`, `sell_through_30d`, `sales_growth_30d_vs_prev`, `revenue_growth_30d_vs_prev`, `demand_pattern`, `seasonal_pattern`, `seasonality_index`, `service_level_30d`, `fill_rate_30d`) plus a time-based safety net: `OR product_metrics.last_calculated < NOW() - INTERVAL '1 day'`. This guarantees every row is refreshed at least daily.
---
## MEDIUM Issues
### M1. Demand variability calculated only over activity days, not full 30-day window
**Location:** `update_product_metrics.sql`, DemandVariability CTE (lines 206-223)
**Symptom:** Variance, std_dev, and CV are computed over only the days that appear in snapshots (activity days), not the full 30-day period including zero-sales days.
Example: Product 41141 (Mexican Poppy) sold 102 units in 30 days across only 3 snapshot days (1, 1, 100). The variance/CV is calculated over just those 3 data points instead of 30 (with 27 zero-sales days).
**Impact:**
- CV is computed on sparse data (3-10 points instead of 30), making it statistically unreliable
- Products with sporadic large orders appear less variable than they really are
- `demand_pattern` classification is affected (stable/variable/sporadic/lumpy)
**Recommended fix:** Join against a generated 30-day date series and COALESCE missing days to 0 units sold before computing variance/stddev/CV.
**Status: FIXED.** Rewrote `DemandVariability` CTE to use `generate_series()` for the full 30-day date range, `CROSS JOIN` with distinct PIDs from snapshots, and `LEFT JOIN` actual snapshot data with `COALESCE(dps.units_sold, 0)` for missing days. Variance/stddev/CV now computed over all 30 data points.
---
### M2. `costeach` fallback to `price * 0.5` affects 32.5% of recent orders
**Location:** `orders.js`, line 600 and 634
**Symptom:** When no cost record exists in `order_costs`, the import falls back to `price * 0.5`.
Data shows 9,839 of 30,266 recent orders (32.5%) use this fallback. Among these, 79 paid products have `costeach = 0` because `price = 0 * 0.5 = 0`, even though the product has a real cost_price.
The daily snapshot has a second line of defense (using `get_weighted_avg_cost()` and then `p.cost_price`), but the orders table's `costeach` column itself contains inaccurate data for ~1/3 of orders.
**Impact:** COGS calculations at the order level are approximate for 1/3 of orders. The snapshot's fallback chain mitigates this somewhat, but any analytics using `orders.costeach` directly will be affected.
**Status: FIXED.** Added `products.cost_price` as intermediate fallback: `COALESCE(oc.costeach, p.cost_price, oi.price * 0.5)`. The products table join was added to both the `order_totals` CTE and the outer SELECT in `orders.js`. Requires a full orders re-import to apply retroactively.
---
### M3. `lifetime_sales` uses MySQL `total_sold` (status >= 20) but orders import uses status >= 15
**Location:** `products.js` line 200 vs `orders.js` line 69
**Symptom:** `total_sold` in the products table comes from MySQL with `order_status >= 20`, excluding status 15 (canceled) and 16 (combined). But the orders import fetches orders with `order_status >= 15`.
Verified in MySQL: For product 31286, `total_sold` (>=20) = 13,786 vs (>=15) = 13,905 (difference of 119 units).
**Impact:** `lifetime_sales` in product_metrics (sourced from `products.total_sold`) slightly understates compared to what the orders table contains. The `lifetime_revenue_quality` field correctly flags most as "estimated" since the orders table only covers ~5 years while `total_sold` is all-time. This is a minor inconsistency (< 1% difference).
**Status:** Accepted. < 1% difference, not worth the complexity of aligning thresholds.
---
### M4. `sell_through_30d` has 868 NULL values and 547 anomalous values for products with sales
**Location:** `update_product_metrics.sql`, lines 356-361
**Formula:** `(sales_30d / (current_stock + sales_30d + returns_units_30d - received_qty_30d)) * 100`
- 868 products with sales but NULL sell_through (denominator = 0, which happens when `current_stock + sales - received = 0`, i.e. all stock came from receiving and was sold)
- 259 products with sell_through > 100%
- 288 products with negative sell_through
**Impact:** Sell-through rate is unreliable for products with significant receiving activity in the same period. The formula tries to approximate "beginning inventory" but the approximation breaks when current stock ≠ actual beginning stock (which is always, per issue C2).
**Status:** Will improve once C2 fix (historical stock) is deployed and snapshots are rebuilt, since `current_stock` in the formula will then reflect actual beginning inventory.
---
## LOW / INFORMATIONAL Issues
### L1. Snapshots only cover ~1,167 products/day out of 681K
Only products with order or receiving activity on a given day get snapshots. This is by design (the `ProductsWithActivity` CTE on line 133 of `update_daily_snapshots.sql`), but it means:
- 560K+ products have zero snapshot history
- Stockout tracking is impossible for products with no sales (they can't appear in snapshots)
- The "avg_stock" metrics (avg_stock_units_30d, etc.) only average over activity days, not all 30 days
This is acceptable for storage efficiency but should be understood when interpreting metrics.
**Status:** Accepted (by design).
---
### L2. `detect_seasonal_pattern` function only compares current month to yearly average
The seasonality detection is simplistic: it compares current month's avg daily sales to yearly avg. This means:
- It can only detect if the CURRENT month is above average, not identify historical seasonal patterns
- Running in January vs July will give completely different results for the same product
- The "peak_season" field always shows the current month/quarter when seasonal (not the actual peak)
This is noted as a P5 (low priority) feature and is adequate for a first pass but should not be relied upon for demand planning.
**Status: FIXED.** Rewrote `detect_seasonal_pattern` function to compare monthly average sales across the full last 12 months. Uses CV across months + peak-to-average ratio for classification: `strong` (CV > 0.5, peak > 150%), `moderate` (CV > 0.3, peak > 120%), `none`. Peak season now identifies the actual highest-sales month. Requires at least 3 months of data. Saved in `db/functions.sql`.
---
### L3. Free product with negative revenue in top sellers
Product 476848 ("Thank You, From ACOT!") shows 254 sales with -$1.00 revenue because one order applied a $1 discount to a $0 product. This is a data oddity, not a calculation bug. Could be addressed by excluding $0-price products from revenue metrics or by data cleanup.
**Status:** Accepted (data oddity, not a bug).
---
### L4. `landing_cost_price` is always NULL
`current_landing_cost_price` in product_metrics is mapped from `current_effective_cost` which is just `cost_price`. The `landing_cost_price` concept (cost + shipping + duties) is not implemented. The field exists but has no meaningful data.
**Status: FIXED.** Removed `landing_cost_price` from `db/schema.sql`, `current_landing_cost_price` from `db/metrics-schema-new.sql`, `update_product_metrics.sql`, and `backfill/populate_initial_product_metrics.sql`. Column should be dropped from the live database via `ALTER TABLE`.
---
### L5. Custom SQL functions not tracked in version control
All 6 custom functions (`calculate_sales_velocity`, `get_weighted_avg_cost`, `safe_divide`, `std_numeric`, `classify_demand_pattern`, `detect_seasonal_pattern`) and the `category_hierarchy` materialized view exist only in the database. They are not defined in any migration or schema file in the repository.
If the database needs to be recreated, these would be lost.
**Status: FIXED.** All 6 functions and the `category_hierarchy` materialized view definition saved to `inventory-server/db/functions.sql`. File is re-runnable via `psql -f functions.sql`.
---
### L6. `get_weighted_avg_cost` limited to last 10 receivings
The function `LIMIT 10` for performance, but this means products with many small receivings may not accurately reflect the true weighted average cost if the cost has changed significantly beyond the last 10 receiving records.
**Status: FIXED.** Removed `LIMIT 10` from `get_weighted_avg_cost`. Data shows max receivings per product is 142 (p95 = 11, avg = 3), so performance impact is negligible. Updated definition in `db/functions.sql`.
---
## Verification Summary
### What's Working Correctly
| Check | Result |
|-------|--------|
| 30d sales: product_metrics vs orders vs snapshots | **MATCH** (verified top 10 sellers) |
| Replenishment formula: manual calc vs stored | **MATCH** (verified 10 products) |
| Brand metrics vs sum of product_metrics | **MATCH** (0 difference across all brands) |
| Order status mapping (numeric → text) | **CORRECT** (all statuses mapped, no numeric remain) |
| Cost price: PostgreSQL vs MySQL source | **MATCH** (within rounding, verified 5 products) |
| total_sold: PostgreSQL vs MySQL source | **MATCH** (verified 5 products) |
| Category rollups (rolled-up > direct for parents) | **CORRECT** |
| ABC classification distribution | **REASONABLE** (A: 8K, B: 12.5K, C: 113K) |
| Lead time calculation (PO → receiving) | **CORRECT** (verified examples) |
### Data Overview
| Metric | Value |
|--------|-------|
| Total products | 681,912 |
| Products in product_metrics | 681,912 (100%) |
| Products with 30d sales | 10,291 (1.5%) |
| Products with negative profit & revenue | 139 (mostly cost > price) |
| Products with negative stock | 0 |
| Snapshot date range | 2020-06-18 to 2026-02-08 |
| Avg products per snapshot day | 1,167 |
| Order date range | 2020-06-18 to 2026-02-08 |
| Total orders | 2,885,825 |
| 'returned' status orders | 0 (returns via negative quantity only) |
---
## Fix Status Summary
| Issue | Severity | Status | Deployment Action Needed |
|-------|----------|--------|--------------------------|
| C1 | Critical | Owner resolving | Rebuild daily snapshots |
| C2 | Critical | **FIXED** | Run import, rebuild daily snapshots |
| C3 | Critical | **FIXED** | Deploy updated `update_product_metrics.sql` |
| M1 | Medium | **FIXED** | Deploy updated `update_product_metrics.sql` |
| M2 | Medium | **FIXED** | Full orders re-import (`--full`) |
| M3 | Medium | Accepted | None |
| M4 | Medium | Pending C2 | Will improve after C2 deployment |
| L1 | Low | Accepted | None |
| L2 | Low | **FIXED** | Deploy `db/functions.sql` to database |
| L3 | Low | Accepted | None |
| L4 | Low | **FIXED** | `ALTER TABLE` to drop columns |
| L5 | Low | **FIXED** | None (file committed) |
| L6 | Low | **FIXED** | Deploy `db/functions.sql` to database |
### Deployment Steps
1. Deploy `db/functions.sql` to PostgreSQL: `psql -d inventory_db -f db/functions.sql` (L2, L6)
2. Run import (includes stock snapshots first load) (C2, M2)
3. Drop stale columns: `ALTER TABLE products DROP COLUMN IF EXISTS landing_cost_price; ALTER TABLE product_metrics DROP COLUMN IF EXISTS current_landing_cost_price;` (L4)
4. Rebuild daily snapshots (C1, C2)
5. Re-run metrics calculation (C3, M1 take effect automatically)