Files
inventory/docs/split-up-pos.md

112 lines
8.7 KiB
Markdown

Okay, I understand completely now. The core issue is that the previous approaches tried too hard to reconcile every receipt back to a specific PO line within the `purchase_orders` table structure, which doesn't reflect the reality where receipts can be independent events. Your downstream scripts, especially `daily_snapshots` and `product_metrics`, rely on having a complete picture of *all* receivings.
Let's pivot to a model that respects both distinct data streams: **Orders (Intent)** and **Receivings (Actuals)**.
**Proposed Solution: Separate `purchase_orders` and `receivings` Tables**
This is the cleanest way to model the reality you've described.
1. **`purchase_orders` Table:**
* **Purpose:** Tracks the status and details of purchase *orders* placed. Represents the *intent* to receive goods.
* **Key Columns:** `po_id`, `pid`, `ordered` (quantity ordered), `po_cost_price`, `date` (order/created date), `expected_date`, `status` (PO lifecycle: 'ordered', 'canceled', 'done'), `vendor`, `notes`, etc.
* **Crucially:** This table *does not* need a `received` column or a `receiving_history` column derived from complex allocations. It focuses solely on the PO itself.
2. **`receivings` Table (New or Refined):**
* **Purpose:** Tracks every single line item received, regardless of whether it was linked to a PO during the receiving process. Represents the *actual* goods that arrived.
* **Key Columns:**
* `receiving_id` (Identifier for the overall receiving document/batch)
* `pid` (Product ID received)
* `received_qty` (Quantity received for this specific line)
* `cost_each` (Actual cost paid for this item on this receiving)
* `received_date` (Actual date the item was received)
* `received_by` (Employee ID/Name)
* `source_po_id` (The `po_id` entered on the receiving screen, *nullable*. Stores the original link attempt, even if it was wrong or missing)
* `source_receiving_status` (The status from the source `receivings` table: 'partial_received', 'full_received', 'paid', 'canceled')
**How the Import Script Changes:**
1. **Fetch POs:** Fetch data from `po` and `po_products`.
2. **Populate `purchase_orders`:**
* Insert/Update rows into `purchase_orders` based directly on the fetched PO data.
* Set `po_id`, `pid`, `ordered`, `po_cost_price`, `date` (`COALESCE(date_ordered, date_created)`), `expected_date`.
* Set `status` by mapping the source `po.status` code directly ('ordered', 'canceled', 'done', etc.).
* **No complex allocation needed here.**
3. **Fetch Receivings:** Fetch data from `receivings` and `receivings_products`.
4. **Populate `receivings`:**
* For *every* line item fetched from `receivings_products`:
* Perform necessary data validation (dates, numbers).
* Insert a new row into `receivings` with all the relevant details (`receiving_id`, `pid`, `received_qty`, `cost_each`, `received_date`, `received_by`, `source_po_id`, `source_receiving_status`).
* Use `ON CONFLICT (receiving_id, pid)` (or similar unique key based on your source data) `DO UPDATE SET ...` for incremental updates if necessary, or simply delete/re-insert based on `receiving_id` for simplicity if performance allows.
**Impact on Downstream Scripts (and how to adapt):**
* **Initial Query (Active POs):**
* `SELECT ... FROM purchase_orders po WHERE po.status NOT IN ('canceled', 'done', 'paid_equivalent_status?') AND po.date >= ...`
* `active_pos`: `COUNT(DISTINCT po.po_id)` based on the filtered POs.
* `overdue_pos`: Add `AND po.expected_date < CURRENT_DATE`.
* `total_units`: `SUM(po.ordered)`. Represents total units *ordered* on active POs.
* `total_cost`: `SUM(po.ordered * po.po_cost_price)`. Cost of units *ordered*.
* `total_retail`: `SUM(po.ordered * pm.current_price)`. Retail value of units *ordered*.
* **Result:** This query now cleanly reports on the status of *orders* placed, which seems closer to its original intent. The filter `po.receiving_status NOT IN ('partial_received', 'full_received', 'paid')` is replaced by `po.status NOT IN ('canceled', 'done', 'paid_equivalent?')`. The 90% received check is removed as `received` is not reliably tracked *on the PO* anymore.
* **`daily_product_snapshots`:**
* **`SalesData` CTE:** No change needed.
* **`ReceivingData` CTE:** **Must be changed.** Query the **`receivings`** table instead of `purchase_orders`.
```sql
ReceivingData AS (
SELECT
rl.pid,
COUNT(DISTINCT rl.receiving_id) as receiving_doc_count,
SUM(rl.received_qty) AS units_received,
SUM(rl.received_qty * rl.cost_each) AS cost_received
FROM public.receivings rl
WHERE rl.received_date::date = _date
-- Optional: Filter out canceled receivings if needed
-- AND rl.source_receiving_status <> 'canceled'
GROUP BY rl.pid
),
```
* **Result:** This now accurately reflects *all* units received on a given day from the definitive source.
* **`update_product_metrics`:**
* **`CurrentInfo` CTE:** No change needed (pulls from `products`).
* **`OnOrderInfo` CTE:** Needs re-evaluation. How do you want to define "On Order"?
* **Option A (Strict PO View):** `SUM(po.ordered)` from `purchase_orders po WHERE po.status NOT IN ('canceled', 'done', 'paid_equivalent?')`. This is quantity on *open orders*, ignoring fulfillment state. Simple, but might overestimate if items arrived unlinked.
* **Option B (Approximate Fulfillment):** `SUM(po.ordered)` from open POs MINUS `SUM(rl.received_qty)` from `receivings rl` where `rl.source_po_id = po.po_id` (summing only directly linked receivings). Better, but still misses fulfillment via unlinked receivings.
* **Option C (Heuristic):** `SUM(po.ordered)` from open POs MINUS `SUM(rl.received_qty)` from `receivings rl` where `rl.pid = po.pid` and `rl.received_date >= po.date`. This *tries* to account for unlinked receivings but is imprecise.
* **Recommendation:** Start with **Option A** for simplicity, clearly labeling it "Quantity on Open POs". You might need a separate process or metric for a more nuanced view of expected vs. actual pipeline.
```sql
-- Example for Option A
OnOrderInfo AS (
SELECT
pid,
SUM(ordered) AS on_order_qty, -- Total qty on open POs
SUM(ordered * po_cost_price) AS on_order_cost -- Cost of qty on open POs
FROM public.purchase_orders
WHERE status NOT IN ('canceled', 'done', 'paid_equivalent?') -- Define your open statuses
GROUP BY pid
),
```
* **`HistoricalDates` CTE:**
* `date_first_sold`, `max_order_date`: No change (queries `orders`).
* `date_first_received_calc`, `date_last_received_calc`: **Must be changed.** Query `MIN(rl.received_date)` and `MAX(rl.received_date)` from the **`receivings`** table grouped by `pid`.
* **`SnapshotAggregates` CTE:**
* `received_qty_30d`, `received_cost_30d`: These are calculated from `daily_product_snapshots`, which are now correctly sourced from `receivings`, so this part is fine.
* **Forecasting Calculations:** Will use the chosen definition of `on_order_qty`. Be aware of the implications of Option A (potentially inflated if unlinked receivings fulfill orders).
* **Result:** Metrics are calculated based on distinct order data and complete receiving data. The definition of "on order" needs careful consideration.
**Summary of this Approach:**
* **Pros:**
* Accurately models distinct order and receiving events.
* Provides a definitive source (`receivings`) for all received inventory.
* Simplifies the `purchase_orders` table and its import logic.
* Avoids complex/potentially inaccurate allocation logic for unlinked receivings within the main tables.
* Avoids synthetic records.
* Fixes downstream reporting (`daily_snapshots` receiving data).
* **Cons:**
* Requires creating/managing the `receivings` table.
* Requires modifying downstream queries (`ReceivingData`, `OnOrderInfo`, `HistoricalDates`).
* Calculating a precise "net quantity still expected to arrive" (true on-order minus all relevant fulfillment) becomes more complex and may require specific business rules or heuristics outside the basic table structure if Option A for `OnOrderInfo` isn't sufficient.
This two-table approach (`purchase_orders` + `receivings`) seems the most robust and accurate way to handle your requirement for complete receiving records independent of potentially flawed PO linking. It directly addresses the shortcomings of the previous attempts.