inventory/docs/split-up-pos.md

Okay, I understand completely now. The core issue is that the previous approaches tried too hard to reconcile every receipt back to a specific PO line within the `purchase_orders` table structure, which doesn't reflect the reality where receipts can be independent events. Your downstream scripts, especially `daily_snapshots` and `product_metrics`, rely on having a complete picture of *all* receivings.

Let's pivot to a model that respects both distinct data streams: **Orders (Intent)** and **Receivings (Actuals)**.

**Proposed Solution: Separate `purchase_orders` and `receivings` Tables**

This is the cleanest way to model the reality you've described.

1.  **`purchase_orders` Table:**
    *   **Purpose:** Tracks the status and details of purchase *orders* placed. Represents the *intent* to receive goods.
    *   **Key Columns:** `po_id`, `pid`, `ordered` (quantity ordered), `po_cost_price`, `date` (order/created date), `expected_date`, `status` (PO lifecycle: 'ordered', 'canceled', 'done'), `vendor`, `notes`, etc.
    *   **Crucially:** This table *does not* need a `received` column or a `receiving_history` column derived from complex allocations. It focuses solely on the PO itself.

2.  **`receivings` Table (New or Refined):**
    *   **Purpose:** Tracks every single line item received, regardless of whether it was linked to a PO during the receiving process. Represents the *actual* goods that arrived.
    *   **Key Columns:**
        *   `receiving_id` (Identifier for the overall receiving document/batch)
        *   `pid` (Product ID received)
        *   `received_qty` (Quantity received for this specific line)
        *   `cost_each` (Actual cost paid for this item on this receiving)
        *   `received_date` (Actual date the item was received)
        *   `received_by` (Employee ID/Name)
        *   `source_po_id` (The `po_id` entered on the receiving screen, *nullable*. Stores the original link attempt, even if it was wrong or missing)
        *   `source_receiving_status` (The status from the source `receivings` table: 'partial_received', 'full_received', 'paid', 'canceled')

**How the Import Script Changes:**

1.  **Fetch POs:** Fetch data from `po` and `po_products`.
2.  **Populate `purchase_orders`:**
    *   Insert/Update rows into `purchase_orders` based directly on the fetched PO data.
    *   Set `po_id`, `pid`, `ordered`, `po_cost_price`, `date` (`COALESCE(date_ordered, date_created)`), `expected_date`.
    *   Set `status` by mapping the source `po.status` code directly ('ordered', 'canceled', 'done', etc.).
    *   **No complex allocation needed here.**
3.  **Fetch Receivings:** Fetch data from `receivings` and `receivings_products`.
4.  **Populate `receivings`:**
    *   For *every* line item fetched from `receivings_products`:
        *   Perform necessary data validation (dates, numbers).
        *   Insert a new row into `receivings` with all the relevant details (`receiving_id`, `pid`, `received_qty`, `cost_each`, `received_date`, `received_by`, `source_po_id`, `source_receiving_status`).
    *   Use `ON CONFLICT (receiving_id, pid)` (or similar unique key based on your source data) `DO UPDATE SET ...` for incremental updates if necessary, or simply delete/re-insert based on `receiving_id` for simplicity if performance allows.

**Impact on Downstream Scripts (and how to adapt):**

*   **Initial Query (Active POs):**
    *   `SELECT ... FROM purchase_orders po WHERE po.status NOT IN ('canceled', 'done', 'paid_equivalent_status?') AND po.date >= ...`
    *   `active_pos`: `COUNT(DISTINCT po.po_id)` based on the filtered POs.
    *   `overdue_pos`: Add `AND po.expected_date < CURRENT_DATE`.
    *   `total_units`: `SUM(po.ordered)`. Represents total units *ordered* on active POs.
    *   `total_cost`: `SUM(po.ordered * po.po_cost_price)`. Cost of units *ordered*.
    *   `total_retail`: `SUM(po.ordered * pm.current_price)`. Retail value of units *ordered*.
    *   **Result:** This query now cleanly reports on the status of *orders* placed, which seems closer to its original intent. The filter `po.receiving_status NOT IN ('partial_received', 'full_received', 'paid')` is replaced by `po.status NOT IN ('canceled', 'done', 'paid_equivalent?')`. The 90% received check is removed as `received` is not reliably tracked *on the PO* anymore.

*   **`daily_product_snapshots`:**
    *   **`SalesData` CTE:** No change needed.
    *   **`ReceivingData` CTE:** **Must be changed.** Query the **`receivings`** table instead of `purchase_orders`.
        ```sql
        ReceivingData AS (
            SELECT
                rl.pid,
                COUNT(DISTINCT rl.receiving_id) as receiving_doc_count,
                SUM(rl.received_qty) AS units_received,
                SUM(rl.received_qty * rl.cost_each) AS cost_received
            FROM public.receivings rl
            WHERE rl.received_date::date = _date
              -- Optional: Filter out canceled receivings if needed
              -- AND rl.source_receiving_status <> 'canceled'
            GROUP BY rl.pid
        ),
        ```
    *   **Result:** This now accurately reflects *all* units received on a given day from the definitive source.

*   **`update_product_metrics`:**
    *   **`CurrentInfo` CTE:** No change needed (pulls from `products`).
    *   **`OnOrderInfo` CTE:** Needs re-evaluation. How do you want to define "On Order"?
        *   **Option A (Strict PO View):** `SUM(po.ordered)` from `purchase_orders po WHERE po.status NOT IN ('canceled', 'done', 'paid_equivalent?')`. This is quantity on *open orders*, ignoring fulfillment state. Simple, but might overestimate if items arrived unlinked.
        *   **Option B (Approximate Fulfillment):** `SUM(po.ordered)` from open POs MINUS `SUM(rl.received_qty)` from `receivings rl` where `rl.source_po_id = po.po_id` (summing only directly linked receivings). Better, but still misses fulfillment via unlinked receivings.
        *   **Option C (Heuristic):** `SUM(po.ordered)` from open POs MINUS `SUM(rl.received_qty)` from `receivings rl` where `rl.pid = po.pid` and `rl.received_date >= po.date`. This *tries* to account for unlinked receivings but is imprecise.
        *   **Recommendation:** Start with **Option A** for simplicity, clearly labeling it "Quantity on Open POs". You might need a separate process or metric for a more nuanced view of expected vs. actual pipeline.
        ```sql
         -- Example for Option A
         OnOrderInfo AS (
             SELECT
                 pid,
                 SUM(ordered) AS on_order_qty, -- Total qty on open POs
                 SUM(ordered * po_cost_price) AS on_order_cost -- Cost of qty on open POs
             FROM public.purchase_orders
             WHERE status NOT IN ('canceled', 'done', 'paid_equivalent?') -- Define your open statuses
             GROUP BY pid
         ),
        ```
    *   **`HistoricalDates` CTE:**
        *   `date_first_sold`, `max_order_date`: No change (queries `orders`).
        *   `date_first_received_calc`, `date_last_received_calc`: **Must be changed.** Query `MIN(rl.received_date)` and `MAX(rl.received_date)` from the **`receivings`** table grouped by `pid`.
    *   **`SnapshotAggregates` CTE:**
        *   `received_qty_30d`, `received_cost_30d`: These are calculated from `daily_product_snapshots`, which are now correctly sourced from `receivings`, so this part is fine.
    *   **Forecasting Calculations:** Will use the chosen definition of `on_order_qty`. Be aware of the implications of Option A (potentially inflated if unlinked receivings fulfill orders).
    *   **Result:** Metrics are calculated based on distinct order data and complete receiving data. The definition of "on order" needs careful consideration.

**Summary of this Approach:**

*   **Pros:**
    *   Accurately models distinct order and receiving events.
    *   Provides a definitive source (`receivings`) for all received inventory.
    *   Simplifies the `purchase_orders` table and its import logic.
    *   Avoids complex/potentially inaccurate allocation logic for unlinked receivings within the main tables.
    *   Avoids synthetic records.
    *   Fixes downstream reporting (`daily_snapshots` receiving data).
*   **Cons:**
    *   Requires creating/managing the `receivings` table.
    *   Requires modifying downstream queries (`ReceivingData`, `OnOrderInfo`, `HistoricalDates`).
    *   Calculating a precise "net quantity still expected to arrive" (true on-order minus all relevant fulfillment) becomes more complex and may require specific business rules or heuristics outside the basic table structure if Option A for `OnOrderInfo` isn't sufficient.

This two-table approach (`purchase_orders` + `receivings`) seems the most robust and accurate way to handle your requirement for complete receiving records independent of potentially flawed PO linking. It directly addresses the shortcomings of the previous attempts.