Fix o3 issues on calculate-metrics script

2025-02-11 15:46:43 -05:00
parent d90e9b51dc
commit d7bf79dec9
2 changed files with 229 additions and 7 deletions
--- a/docs/calculate-issues.md
+++ b/docs/calculate-issues.md
@@ -0,0 +1,187 @@
+1. **Missing Updates for Reorder Point and Safety Stock**
+- **Problem:** In the **product_metrics** table (used by the inventory health view), the fields **reorder_point** and **safety_stock** are never updated in the product metrics calculations. Although a helper function (`calculateReorderQuantities`) exists and computes these values, the update query in the `calculateProductMetrics` function does not assign any values to these columns.
+- **Effect:** The inventory health view relies on these fields (using COALESCE to default them to 0), which means that stock might never be classified as "Reorder" or "Healthy" based on the proper reorder point or safety stock calculations.
+- **Example:** Even if a product's base metrics would require a reorder (for example, if its days of inventory are low), the view always shows a value of 0 for reorder_point and safety_stock.
+- **Fix:** Update the product metrics query (or add a subsequent update) so that **pm.reorder_point** and **pm.safety_stock** are calculated (for instance, by integrating the logic from `calculateReorderQuantities`) and stored in the table.
+
+2. **Overwritten Module Exports When Combining Scripts** [RESOLVED - calculate-metrics.js]
+- **Problem:** The code provided shows two distinct exports. The main metrics calculation module exports `calculateMetrics` (along with cancel and getProgress helpers), but later in the same concatenated file the module exports are overwritten.
+- **Effect:** If these two code sections end up in a single module file, the export for the main calculation will be lost. This would break any code that calls the overall metrics calculation.
+- **Example:** An external caller expecting to run `calculateMetrics` would instead receive the `calculateProductMetrics` function.
+- **Fix:** Make sure each script resides in its own module file. Verify that the module boundaries and exports are not accidentally merged or overwritten when deployed.
+
+3. **Potential Formula Issue in EOQ Calculation (Reorder Qty)**
+- **Problem:** The helper function `calculateReorderQuantities` uses an EOQ formula with a holding cost expressed as a percentage (0.25) rather than a per‐unit cost.
+- **Effect:** If the intent was to use the traditional EOQ formula (which expects a holding cost per unit rather than a percentage), this could lead to an incorrect reorder quantity.
+- **Example:** For a given annual demand and fixed order cost, the computed reorder quantity might be higher or lower than expected.
+- **Fix:** Double-check the EOQ formula. If the intention is to compute based on a percentage, then document that clearly; otherwise, adjust the formula to use the proper holding cost value.
+
+4. **Potential Overlap or Redundancy in GMROI Calculation**
+- **Problem:** In the time aggregates function, GMROI is calculated in two steps. The initial INSERT query computes GMROI as
+
+	`CASE WHEN s.inventory_value > 0 THEN (s.total_revenue - s.total_cost) / s.inventory_value ELSE 0 END`
+
+	and then a subsequent UPDATE query recalculates it as an annualized value using gross profit and active days.
+
+
+- **Effect:** Overwriting a computed value may be intentional to refine the metric, but if not coordinated it can cause confusion or unexpected output in the `product_time_aggregates` table.
+- **Example:** A product's GMROI might first appear as a simple ratio but then be updated to a scaled value based on the number of active days, which could lead to inconsistent reporting if not documented.
+- **Fix:** Confirm that the two-step process is intended. If only the annualized GMROI is desired, consolidate the calculation into one query or clearly document why both steps are needed.
+
+*This observation complements the earlier note about duplicate or overwritten calculations in the previous script. In both cases, it's important to verify that updates (or recalculations) are intentional rather than an oversight.*
+
+5. **Handling of Products Without Orders or Purchase Data**
+- ******Problem:** In the INSERT query of the time aggregates function, the UNION covers two cases: one for products with order data (from `monthly_sales`) and one for products that have entries in `monthly_stock` but no matching order data.
+- **Effect:** If a product has neither orders nor purchase orders, it won't get an entry in `product_time_aggregates`. Depending on business rules, this might be acceptable or might mean missing data.
+- **Example:** A product that's new or rarely ordered might not appear in the time aggregates view, potentially affecting downstream calculations.
+- **Fix:** If you need every product to have an aggregate record (even with zeros), add an additional query or logic to ensure that products without any matching records in both CTEs are inserted with default values.
+
+6. **Redundant Recalculation of Vendor Metrics**
+- **Problem:** Similar concepts from prior scripts where cumulative metrics (like **total_revenue** and **total_cost**) are calculated in multiple query steps without necessary validation or optimization. In the vendor metrics script, calculations for total revenue and margin are performed within a `WITH` clause, which is then used in other parts of the process, making it more complex than needed.
+- **Effect:** There's unnecessary duplication in querying the same data multiple times across subqueries. It could result in decreased performance and may even lead to excess computation if the subqueries are not optimized or correctly indexed.
+- **Example:** Vendor sales and vendor purchase orders (PO) metrics are calculated in separate `WITH` clauses, leading to repeated calculations.
+- **Fix:** Synthesize the required metrics into fewer queries or reuse the results within the `WITH` clause itself. Avoid redundant calculations of **revenue** and **cost** unless truly necessary.
+
+7. **Handling Products Without Orders or Purchase Orders**
+- **Problem:** In your `calculateVendorMetrics` script, the initial insert for vendor sales doesn't fully address the products that might not have matching orders or purchase orders. If a vendor has products without any sales within the last 12 months, the results may not be fully accurate unless handled explicitly.
+- **Effect:** If no orders exist for a product associated with a particular vendor, that product will not contribute to the vendor's metrics, potentially omitting important data when calculating **total_orders** or **total_revenue**.
+- **Example:** The scripted statistics fill gaps, but products with no recent purchase or sales orders might not be counted accurately.
+- **Fix:** Include logic to handle scenarios where these products still need to be part of the vendor calculation. Use a `LEFT JOIN` wherever possible to account for cases without sales or purchase orders.
+
+8. **Redundant `ON DUPLICATE KEY UPDATE`**
+- **Problem:** Multiple queries in the `calculateVendorMetrics` script use `ON DUPLICATE KEY UPDATE` clauses to handle repeated metrics updates. This is useful for ensuring the most up-to-date calculations but can cause inconsistencies if multiple calculations happen for the same product or vendor simultaneously.
+- **Effect:** This approach can lead to an inaccurate update of brand-specific data when insertion and update overlap. Each time you add a new batch, an existing entry could be overwritten if not handled correctly.
+- **Example:** Vendor country, category, or sales-related metrics could unintentionally update during processing.
+- **Fix:** Match on current status more robustly in case of existing rows to avoid unnecessary updates. Ensure that the key used for `ON DUPLICATE KEY` aligns with any foreign key relationships that might indicate an already processed entry.
+
+9. **SQL Query Performance with Multiple Nested `WITH` Clauses**
+- **Problem:** Heavily nested queries (especially **WITH** clauses) may lead to slow performance depending on the size of the dataset.
+- **Effect:** Computational burden could be high when the database is large, e.g., querying **purchase orders**, **vendor sales**, and **product info** simultaneously. Even with proper indexes, the deployment might struggle in production environments.
+- **Example:** Multiple `WITH` clauses in the vendor and brand metrics calculation scripts might work fine in small datasets but degrade performance in production.
+- **Fix:** Combine some subqueries and reduce the layer of computations needed for calculating final metrics. Test performance on a production-sized dataset to see how nested queries are handled.
+
+10. **Missing Updates for Reorder Metrics (Vendor/Brand)**
+- **Previously Identified Issue:** Inconsistent updates for **reorder_point** and **safety_stock** across earlier scripts.
+- **Current Impact on This Script:** The vendor and brand metrics do not have explicit updates for reorder point or safety stock, which are essential for inventory evaluation.
+- **Effect:** The correct thresholds and reorder logic for vendor product inventory aren't fully accounted for in these scripts.
+- **Fix:** Integrate relevant logic to update **reorder_point** or **safety_stock** within the vendor and brand metrics calculations. Ensure that it's consistently computed and stored.
+
+11. **Data Integrity and Consistency** 
+
+**w**hen tracking sales growth or performance
+
+
+- **Problem:** Brand metrics include a sales growth clause where negative results can sometimes be skewed severely if period data varies considerably.
+- **Effect:** If period boundaries are incorrect or records are missing, this can create drastic growth rate calculations.
+- **Example:** If the "previous" period has no sales but "current" has a substantial increase, the growth rate will show as **100%**.
+- **Fix:** Implement checks that ensure both periods are valid and that the system calculates growth accurately, avoiding growth rates based solely on potential outliers. Replace consistent gaps with a no-growth rate or a meaningful zero.
+
+12. **Exclusion of Vendors With No Sales**
+
+	The vendor metrics query is driven by the `vendor_sales` CTE, which aggregates data only for vendors that have orders in the past 12 months.
+
+
+	- **Impact:** Vendors that have purchase activity (or simply exist in vendor_details) but no recent sales won't show up in vendor_metrics. This could cause the frontend to miss metrics for vendors that might still be important.
+	- **Fix:** Consider adding a UNION or changing the driving set so that all vendors (for example, from vendor_details) are included—even if they have zero sales.
+13. **Identical Formulas for On-Time Delivery and Order Fill Rates**
+
+	Both metrics are calculated as `(received_orders / total_orders) * 100`.
+
+
+	- **Impact:** If the business expects these to be distinct (for example, one might factor in on-time receipt versus mere receipt), then showing identical values on the frontend could be misleading.
+	- **Fix:** Verify and adjust the formulas if on-time delivery and order fill rates should be computed differently.
+14. **Handling Nulls and Defaults in Aggregations**
+
+	The query uses COALESCE in most places, but be sure that every aggregated value (like average lead time) correctly defaults when no data is present.
+
+
+	- **Impact:** Incorrect defaults might cause odd or missing numbers on the production interface.
+	- **Fix:** Double-check that all numeric aggregates reliably default to 0 where needed.
+
+15. **Inconsistent Stock Filtering Conditions**
+
+	In the main brand metrics query the CTE filters products with the condition
+
+	`p.stock_quantity <= 5000 AND p.stock_quantity >= 0`
+
+	whereas in the brand time-based metrics query the condition is only `p.stock_quantity <= 5000`.
+
+
+	- **Impact:** This discrepancy may lead to inconsistent numbers (for example, if any products have negative stock, which might be due to data issues) between overall brand metrics and time-based metrics on the frontend.
+	- **Fix:** Standardize the filtering criteria so that both queries treat out-of-range stock values in the same way.
+16. **Growth Rate Calculation Periods**
+
+	The growth rate is computed by comparing revenue from the last 3 months ("current") against a period from 15–12 months ago ("previous").
+
+
+	- **Impact:** This narrow window may not reflect typical year-over-year performance and could lead to volatile or unexpected growth percentages on the frontend.
+	- **Fix:** Revisit the business logic for growth—if a longer or different comparison period is preferred, adjust the date intervals accordingly.
+17. **Potential NULLs in Aggregated Time-Based Metrics**
+
+	In the brand time-based metrics query, aggregate expressions such as `SUM(o.quantity * o.price)` aren't wrapped with COALESCE.
+
+
+	- **Impact:** If there are no orders for a given brand/month, these sums might return NULL rather than 0, which could propagate into the frontend display.
+	- **Fix:** Wrap such aggregates in COALESCE (e.g. `COALESCE(SUM(o.quantity * o.price), 0)`) to ensure a default numeric value.
+
+18. **Grouping by Category Status in Base Metrics Insert**
+	- **Problem:** The INSERT for base category metrics groups by both `c.cat_id` and `c.status` even though the table's primary key is just `category_id`.
+	- **Effect:** If a category's status changes over time, the grouping may produce unexpected updates (or even multiple groups before the duplicate key update kicks in), possibly causing the wrong status or aggregated figures to be stored.
+	- **Example:** A category that toggles between "active" and "inactive" might have its metrics calculated differently on different runs.
+	- **Fix:** Ensure that the grouping keys match the primary key (or that the status update logic is exactly as intended) so that a single row per category is maintained.
+19. **Potential Null Handling in Margin Calculations**
+	- **Problem:** In the query for category time metrics, the calculation of average margin uses expressions such as `SUM(o.quantity * (o.price - GREATEST(p.cost_price, 0)))` without using `COALESCE` on `p.cost_price`.
+	- **Effect:** If any product's `cost_price` is `NULL`, then `GREATEST(p.cost_price, 0)` returns `NULL` and the resulting sum (and thus the margin) could become `NULL` rather than defaulting to 0. This might lead to missing or misleading margin figures on the frontend.
+	- **Example:** A product with a missing cost price would make the entire margin expression evaluate to `NULL` even when sales exist.
+	- **Fix:** Replace `GREATEST(p.cost_price, 0)` with `GREATEST(COALESCE(p.cost_price, 0), 0)` (or simply use `COALESCE(p.cost_price, 0)`) to ensure that missing values are handled.
+20. **Data Coverage in Growth Rate Calculation**
+	- **Problem:** The growth rate update depends on multiple CTEs (current period, previous period, and trend analysis) that require a minimum amount of data (for instance, `HAVING COUNT(*) >= 6` in the trend_stats CTE).
+	- **Effect:** Categories with insufficient historical data will fall into the "ELSE" branch (or may even be skipped if no revenue is present), which might result in a growth rate of 0.0 or an unexpected value.
+	- **Example:** A newly created category that has only two months of data won't have trend analysis, so its growth rate will be calculated solely by the simple difference, which might not reflect true performance.
+	- **Fix:** Confirm that this fallback behavior is acceptable for production; if not, adjust the logic so that every category receives a consistent growth rate even with sparse data.
+21. **Omission of Forecasts for Zero–Sales Categories**
+	- **Observation:** The category–sales metrics query uses a `HAVING AVG(cs.daily_quantity) > 0` clause.
+	- **Effect:** Categories without any average daily sales will not receive a forecast record in `category_sales_metrics`. If the frontend expects a row (even with zeros) for every category, this will lead to missing data.
+	- **Fix:** Verify that it's acceptable for categories with no sales to have no forecast entry. If not, adjust the query so that a default forecast (with zeros) is inserted.
+
+22. **Randomness in Category-Level Forecast Revenue Calculation**
+	- **Problem:** In the category-level forecasts query, the forecast revenue is multiplied by a factor of `(0.95 + (RAND() * 0.1))`.
+	- **Effect:** This introduces randomness into the forecast figures so that repeated runs could yield slightly different values. If deterministic forecasts are expected on the production frontend, this could lead to inconsistent displays.
+	- **Example:** The same category might show a 5% higher forecast on one run and 3% on another because of the random multiplier.
+	- **Fix:** Confirm that this randomness is intentional for your forecasting model; if forecasts are meant to be reproducible, remove or replace the `RAND()` factor with a fixed multiplier.
+23. **Multi-Statement Cleanup of Temporary Tables**
+	- **Problem:** The cleanup query drops multiple temporary tables in one call (separated by semicolons).
+	- **Effect:** If your Node.js MySQL driver isn't configured to allow multi-statement execution, this query may fail, leaving temporary tables behind. Leftover temporary tables might eventually cause conflicts or resource issues.
+	- **Example:** Running the cleanup query could produce an error like "multi-statement queries not enabled," preventing proper cleanup.
+	- **Fix:** Either configure your database connection to allow multi-statements or issue separate queries for each temporary table drop to ensure that the cleanup runs successfully.
+24. **Handling Products with No Sales Data**
+	- **Problem:** In the product-level forecast calculation, the CTE `daily_stats` includes a `HAVING AVG(ds.daily_quantity) > 0` clause.
+	- **Effect:** Products that have no sales (or a zero average daily quantity) will be excluded from the forecasts. This means the frontend won't show forecasts for non–selling products, which might be acceptable but could also be a completeness issue.
+	- **Example:** A product that has never sold will not appear in the `sales_forecasts` table.
+	- **Fix:** Confirm that it is intended for forecasts to be generated only for products with some sales activity. If forecasts are required for all products, adjust the query to insert default forecast records for products with zero sales.
+25. **Complexity of the Forecast Formula Involving the Seasonality Factor**
+- **Issue:**
+
+	The sales forecast calculations incorporate an adjustment factor using `COALESCE(sf.seasonality_factor, 0)` to modify forecast units and revenue. This means that if the seasonality data is missing (or not populated), the factor defaults to 0.
+
+
+- **Potential Problem:**
+
+	A default value of 0 will drastically alter the forecast calculations—often leading to a forecast of 0 or an overly dampened forecast—when in reality the intended behavior might be to use a neutral multiplier (typically 1.0). This could result in forecasts that are not reflective of the actual seasonal impact, thereby skewing the figures that reach the frontend.
+
+
+- **Fix:**
+
+	Review your data source for seasonality (the `sales_seasonality` table) and ensure it's consistently populated. Alternatively, if missing seasonality data is possible, consider using a more neutral default (such as 1.0) in your COALESCE. This change would prevent the forecast formulas from over-simplifying (or even nullifying) the forecast output due to missing seasonality factors.
+
+
+26. **Group By with Seasonality Factor Variability**
+	- **Observation:** In the forecast insertion query, the GROUP BY clause includes `sf.seasonality_factor` along with other fields.
+	- **Effect:** If the seasonality factor differs (or is `NULL` versus a value) for different forecast dates, this might result in multiple rows for the same product and forecast date. However, the `ON DUPLICATE KEY UPDATE` clause will merge them—but only if the primary key (pid, forecast_date) is truly unique.
+	- **Fix:** Verify that the grouping produces exactly one row per product per forecast date. If there's potential for multiple rows due to seasonality variability, consider applying a COALESCE or an aggregation on the seasonality factor so that it does not affect grouping.
+
+27. **Memory Management for Temporary Tables** [RESOLVED - calculate-metrics.js]
+- **Problem:** In metrics calculations, temporary tables aren't always properly cleaned up if the process fails between creation and the DROP statement.
+- **Effect:** If a process fails after creating temporary tables but before dropping them, these tables remain in memory until the connection is closed. In a production environment with multiple calculation runs, this could lead to memory leaks or table name conflicts.
+- **Example:** The `temp_revenue_ranks` table creation in ABC classification could remain if the process fails before reaching the DROP statement.
+- **Fix:** Implement proper cleanup in a finally block or use transaction management that ensures temporary tables are always cleaned up, even in failure scenarios.
--- a/inventory-server/scripts/calculate-metrics.js
+++ b/inventory-server/scripts/calculate-metrics.js
@@ -44,6 +44,34 @@ global.clearProgress = progress.clearProgress;
 global.getProgress = progress.getProgress;
 global.logError = progress.logError;

+// List of temporary tables used in the calculation process
+const TEMP_TABLES = [
+    'temp_revenue_ranks',
+    'temp_sales_metrics',
+    'temp_purchase_metrics',
+    'temp_product_metrics',
+    'temp_vendor_metrics',
+    'temp_category_metrics',
+    'temp_brand_metrics',
+    'temp_forecast_dates',
+    'temp_daily_sales',
+    'temp_product_stats',
+    'temp_category_sales',
+    'temp_category_stats'
+];
+
+// Add cleanup function for temporary tables
+async function cleanupTemporaryTables(connection) {
+    try {
+        for (const table of TEMP_TABLES) {
+            await connection.query(`DROP TEMPORARY TABLE IF EXISTS ${table}`);
+        }
+    } catch (error) {
+        logError(error, 'Error cleaning up temporary tables');
+        throw error; // Re-throw to be handled by the caller
+    }
+}
+
 const { getConnection, closePool } = require('./metrics/utils/db');
 const calculateProductMetrics = require('./metrics/product-metrics');
 const calculateTimeAggregates = require('./metrics/time-aggregates');
@@ -650,19 +678,26 @@ async function calculateMetrics() {
            throw error;
        } finally {
            if (connection) {
+                // Ensure temporary tables are cleaned up
+                await cleanupTemporaryTables(connection);
                connection.release();
            }
-        }
-    } finally {
            // Close the connection pool when we're done
            await closePool();
        }
+    } catch (error) {
+        success = false;
+        logError(error, 'Error in metrics calculation');
+        throw error;
+    }
 }

-// Export both functions and progress checker
-module.exports = calculateMetrics;
-module.exports.cancelCalculation = cancelCalculation;
-module.exports.getProgress = global.getProgress;
+// Export as a module with all necessary functions
+module.exports = {
+    calculateMetrics,
+    cancelCalculation,
+    getProgress: global.getProgress
+};

 // Run directly if called from command line
 if (require.main === module) {