Fix purchase orders import
This commit is contained in:
342
docs/import-from-prod-data-mapping.md
Normal file
342
docs/import-from-prod-data-mapping.md
Normal file
@@ -0,0 +1,342 @@
|
||||
# MySQL to PostgreSQL Import Process Documentation
|
||||
|
||||
This document outlines the data import process from the production MySQL database to the local PostgreSQL database, focusing on column mappings, data transformations, and the overall import architecture.
|
||||
|
||||
## Table of Contents
|
||||
1. [Overview](#overview)
|
||||
2. [Import Architecture](#import-architecture)
|
||||
3. [Column Mappings](#column-mappings)
|
||||
- [Categories](#categories)
|
||||
- [Products](#products)
|
||||
- [Product Categories (Relationship)](#product-categories-relationship)
|
||||
- [Orders](#orders)
|
||||
- [Purchase Orders](#purchase-orders)
|
||||
- [Metadata Tables](#metadata-tables)
|
||||
4. [Special Calculations](#special-calculations)
|
||||
5. [Implementation Notes](#implementation-notes)
|
||||
|
||||
## Overview
|
||||
|
||||
The import process extracts data from a MySQL 5.7 production database and imports it into a PostgreSQL database. It can operate in two modes:
|
||||
|
||||
- **Full Import**: Imports all data regardless of last sync time
|
||||
- **Incremental Import**: Only imports data that has changed since the last import
|
||||
|
||||
The process handles four main data types:
|
||||
- Categories (product categorization hierarchy)
|
||||
- Products (inventory items)
|
||||
- Orders (sales records)
|
||||
- Purchase Orders (vendor orders)
|
||||
|
||||
## Import Architecture
|
||||
|
||||
The import process follows these steps:
|
||||
|
||||
1. **Establish Connection**: Creates a SSH tunnel to the production server and establishes database connections
|
||||
2. **Setup Import History**: Creates a record of the current import operation
|
||||
3. **Import Categories**: Processes product categories in hierarchical order
|
||||
4. **Import Products**: Processes products with their attributes and category relationships
|
||||
5. **Import Orders**: Processes customer orders with line items, taxes, and discounts
|
||||
6. **Import Purchase Orders**: Processes vendor purchase orders with line items
|
||||
7. **Record Results**: Updates the import history with results
|
||||
8. **Close Connections**: Cleans up connections and resources
|
||||
|
||||
Each import step uses temporary tables for processing and wraps operations in transactions to ensure data consistency.
|
||||
|
||||
## Column Mappings
|
||||
|
||||
### Categories
|
||||
| PostgreSQL Column | MySQL Source | Transformation |
|
||||
|-------------------|---------------------------------|----------------------------------------------|
|
||||
| cat_id | product_categories.cat_id | Direct mapping |
|
||||
| name | product_categories.name | Direct mapping |
|
||||
| type | product_categories.type | Direct mapping |
|
||||
| parent_id | product_categories.master_cat_id| NULL for top-level categories (types 10, 20) |
|
||||
| description | product_categories.combined_name| Direct mapping |
|
||||
| status | N/A | Hard-coded 'active' |
|
||||
| created_at | N/A | Current timestamp |
|
||||
| updated_at | N/A | Current timestamp |
|
||||
|
||||
**Notes:**
|
||||
- Categories are processed in hierarchical order by type: [10, 20, 11, 21, 12, 13]
|
||||
- Type 10/20 are top-level categories with no parent
|
||||
- Types 11/21/12/13 are child categories that reference parent categories
|
||||
|
||||
### Products
|
||||
| PostgreSQL Column | MySQL Source | Transformation |
|
||||
|----------------------|----------------------------------|---------------------------------------------------------------|
|
||||
| pid | products.pid | Direct mapping |
|
||||
| title | products.description | Direct mapping |
|
||||
| description | products.notes | Direct mapping |
|
||||
| sku | products.itemnumber | Fallback to 'NO-SKU' if empty |
|
||||
| stock_quantity | shop_inventory.available_local | Capped at 5000, minimum 0 |
|
||||
| preorder_count | current_inventory.onpreorder | Default 0 |
|
||||
| notions_inv_count | product_notions_b2b.inventory | Default 0 |
|
||||
| price | product_current_prices.price_each| Default 0, filtered on active=1 |
|
||||
| regular_price | products.sellingprice | Default 0 |
|
||||
| cost_price | product_inventory | Weighted average: SUM(costeach * count) / SUM(count) when count > 0, or latest costeach |
|
||||
| vendor | suppliers.companyname | Via supplier_item_data.supplier_id |
|
||||
| vendor_reference | supplier_item_data | supplier_itemnumber or notions_itemnumber based on vendor |
|
||||
| notions_reference | supplier_item_data.notions_itemnumber | Direct mapping |
|
||||
| brand | product_categories.name | Linked via products.company |
|
||||
| line | product_categories.name | Linked via products.line |
|
||||
| subline | product_categories.name | Linked via products.subline |
|
||||
| artist | product_categories.name | Linked via products.artist |
|
||||
| categories | product_category_index | Comma-separated list of category IDs |
|
||||
| created_at | products.date_created | Validated date, NULL if invalid |
|
||||
| first_received | products.datein | Validated date, NULL if invalid |
|
||||
| landing_cost_price | NULL | Not set |
|
||||
| barcode | products.upc | Direct mapping |
|
||||
| harmonized_tariff_code| products.harmonized_tariff_code | Direct mapping |
|
||||
| updated_at | products.stamp | Validated date, NULL if invalid |
|
||||
| visible | shop_inventory | Calculated from show + buyable > 0 |
|
||||
| managing_stock | N/A | Hard-coded true |
|
||||
| replenishable | Multiple fields | Complex calculation based on reorder, dates, etc. |
|
||||
| permalink | N/A | Constructed URL with product ID |
|
||||
| moq | supplier_item_data | notions_qty_per_unit or supplier_qty_per_unit, minimum 1 |
|
||||
| uom | N/A | Hard-coded 1 |
|
||||
| rating | products.rating | Direct mapping |
|
||||
| reviews | products.rating_votes | Direct mapping |
|
||||
| weight | products.weight | Direct mapping |
|
||||
| length | products.length | Direct mapping |
|
||||
| width | products.width | Direct mapping |
|
||||
| height | products.height | Direct mapping |
|
||||
| country_of_origin | products.country_of_origin | Direct mapping |
|
||||
| location | products.location | Direct mapping |
|
||||
| total_sold | order_items | SUM(qty_ordered) for all order_items where prod_pid = pid |
|
||||
| baskets | mybasket | COUNT of records where mb.item = pid and qty > 0 |
|
||||
| notifies | product_notify | COUNT of records where pn.pid = pid |
|
||||
| date_last_sold | product_last_sold.date_sold | Validated date, NULL if invalid |
|
||||
| image | N/A | Constructed from pid and image URL pattern |
|
||||
| image_175 | N/A | Constructed from pid and image URL pattern |
|
||||
| image_full | N/A | Constructed from pid and image URL pattern |
|
||||
| options | NULL | Not set |
|
||||
| tags | NULL | Not set |
|
||||
|
||||
**Notes:**
|
||||
- Replenishable calculation:
|
||||
```javascript
|
||||
CASE
|
||||
WHEN p.reorder < 0 THEN 0
|
||||
WHEN (
|
||||
(COALESCE(pls.date_sold, '0000-00-00') = '0000-00-00' OR pls.date_sold <= DATE_SUB(CURRENT_DATE, INTERVAL 5 YEAR))
|
||||
AND (p.datein = '0000-00-00 00:00:00' OR p.datein <= DATE_SUB(CURRENT_TIMESTAMP, INTERVAL 5 YEAR))
|
||||
AND (p.date_refill = '0000-00-00 00:00:00' OR p.date_refill <= DATE_SUB(CURRENT_TIMESTAMP, INTERVAL 5 YEAR))
|
||||
) THEN 0
|
||||
ELSE 1
|
||||
END
|
||||
```
|
||||
|
||||
In business terms, a product is considered NOT replenishable only if:
|
||||
- It was manually flagged as not replenishable (negative reorder value)
|
||||
- OR it shows no activity across ALL metrics (no sales AND no receipts AND no refills in the past 5 years)
|
||||
- Image URLs are constructed using this pattern:
|
||||
```javascript
|
||||
const paddedPid = pid.toString().padStart(6, '0');
|
||||
const prefix = paddedPid.slice(0, 3);
|
||||
const basePath = `${imageUrlBase}${prefix}/${pid}`;
|
||||
return {
|
||||
image: `${basePath}-t-${iid}.jpg`,
|
||||
image_175: `${basePath}-175x175-${iid}.jpg`,
|
||||
image_full: `${basePath}-o-${iid}.jpg`
|
||||
};
|
||||
```
|
||||
|
||||
### Product Categories (Relationship)
|
||||
|
||||
| PostgreSQL Column | MySQL Source | Transformation |
|
||||
|-------------------|-----------------------------------|---------------------------------------------------------------|
|
||||
| pid | products.pid | Direct mapping |
|
||||
| cat_id | product_category_index.cat_id | Direct mapping, filtered by category types |
|
||||
|
||||
**Notes:**
|
||||
- Only categories of types 10, 20, 11, 21, 12, 13 are imported
|
||||
- Categories 16 and 17 are explicitly excluded
|
||||
|
||||
### Orders
|
||||
|
||||
| PostgreSQL Column | MySQL Source | Transformation |
|
||||
|-------------------|-----------------------------------|---------------------------------------------------------------|
|
||||
| order_number | order_items.order_id | Direct mapping |
|
||||
| pid | order_items.prod_pid | Direct mapping |
|
||||
| sku | order_items.prod_itemnumber | Fallback to 'NO-SKU' if empty |
|
||||
| date | _order.date_placed_onlydate | Via join to _order table |
|
||||
| price | order_items.prod_price | Direct mapping |
|
||||
| quantity | order_items.qty_ordered | Direct mapping |
|
||||
| discount | Multiple sources | Complex calculation (see notes) |
|
||||
| tax | order_tax_info_products.item_taxes_to_collect | Via latest order_tax_info record |
|
||||
| tax_included | N/A | Hard-coded false |
|
||||
| shipping | N/A | Hard-coded 0 |
|
||||
| customer | _order.order_cid | Direct mapping |
|
||||
| customer_name | users | CONCAT(users.firstname, ' ', users.lastname) |
|
||||
| status | _order.order_status | Direct mapping |
|
||||
| canceled | _order.date_cancelled | Boolean: true if date_cancelled is not '0000-00-00 00:00:00' |
|
||||
| costeach | order_costs | From latest record or fallback to price * 0.5 |
|
||||
|
||||
**Notes:**
|
||||
- Only orders with order_status >= 15 and with a valid date_placed are processed
|
||||
- For incremental imports, only orders modified since last sync are processed
|
||||
- Discount calculation combines three sources:
|
||||
1. Base discount: order_items.prod_price_reg - order_items.prod_price
|
||||
2. Promo discount: SUM of order_discount_items.amount
|
||||
3. Proportional order discount: Calculation based on order subtotal proportion
|
||||
```javascript
|
||||
(oi.base_discount +
|
||||
COALESCE(ot.promo_discount, 0) +
|
||||
CASE
|
||||
WHEN om.summary_discount > 0 AND om.summary_subtotal > 0 THEN
|
||||
ROUND((om.summary_discount * (oi.price * oi.quantity)) / NULLIF(om.summary_subtotal, 0), 2)
|
||||
ELSE 0
|
||||
END)::DECIMAL(10,2)
|
||||
```
|
||||
- Taxes are taken from the latest tax record for an order
|
||||
- Cost data is taken from the latest non-pending cost record
|
||||
|
||||
### Purchase Orders
|
||||
|
||||
| PostgreSQL Column | MySQL Source | Transformation |
|
||||
|-------------------|-----------------------------------|---------------------------------------------------------------|
|
||||
| po_id | po.po_id | Default 0 if NULL |
|
||||
| pid | po_products.pid | Direct mapping |
|
||||
| sku | products.itemnumber | Fallback to 'NO-SKU' if empty |
|
||||
| name | products.description | Fallback to 'Unknown Product' |
|
||||
| cost_price | po_products.cost_each | Direct mapping |
|
||||
| po_cost_price | po_products.cost_each | Duplicate of cost_price |
|
||||
| vendor | suppliers.companyname | Fallback to 'Unknown Vendor' if empty |
|
||||
| date | po.date_ordered | Fallback to po.date_created if NULL |
|
||||
| expected_date | po.date_estin | Direct mapping |
|
||||
| status | po.status | Default 1 if NULL |
|
||||
| notes | po.short_note | Fallback to po.notes if NULL |
|
||||
| ordered | po_products.qty_each | Direct mapping |
|
||||
| received | N/A | Hard-coded 0 |
|
||||
| receiving_status | N/A | Hard-coded 1 |
|
||||
|
||||
**Notes:**
|
||||
- Only POs created within last 1 year (incremental) or 5 years (full) are processed
|
||||
- For incremental imports, only POs modified since last sync are processed
|
||||
|
||||
### Metadata Tables
|
||||
|
||||
#### import_history
|
||||
|
||||
| PostgreSQL Column | Source | Notes |
|
||||
|-------------------|-----------------------------------|---------------------------------------------------------------|
|
||||
| id | Auto-increment | Primary key |
|
||||
| table_name | Code | 'all_tables' for overall import |
|
||||
| start_time | NOW() | Import start time |
|
||||
| end_time | NOW() | Import completion time |
|
||||
| duration_seconds | Calculation | Elapsed seconds |
|
||||
| is_incremental | INCREMENTAL_UPDATE | Flag from config |
|
||||
| records_added | Calculation | Sum from all imports |
|
||||
| records_updated | Calculation | Sum from all imports |
|
||||
| status | Code | 'running', 'completed', 'failed', or 'cancelled' |
|
||||
| error_message | Exception | Error message if failed |
|
||||
| additional_info | JSON | Configuration and results |
|
||||
|
||||
#### sync_status
|
||||
|
||||
| PostgreSQL Column | Source | Notes |
|
||||
|----------------------|--------------------------------|---------------------------------------------------------------|
|
||||
| table_name | Code | Name of imported table |
|
||||
| last_sync_timestamp | NOW() | Timestamp of successful sync |
|
||||
| last_sync_id | NULL | Not used currently |
|
||||
|
||||
## Special Calculations
|
||||
|
||||
### Date Validation
|
||||
|
||||
MySQL dates are validated before insertion into PostgreSQL:
|
||||
|
||||
```javascript
|
||||
function validateDate(mysqlDate) {
|
||||
if (!mysqlDate || mysqlDate === '0000-00-00' || mysqlDate === '0000-00-00 00:00:00') {
|
||||
return null;
|
||||
}
|
||||
// Check if the date is valid
|
||||
const date = new Date(mysqlDate);
|
||||
return isNaN(date.getTime()) ? null : mysqlDate;
|
||||
}
|
||||
```
|
||||
|
||||
### Retry Mechanism
|
||||
|
||||
Operations that might fail temporarily are retried with exponential backoff:
|
||||
|
||||
```javascript
|
||||
async function withRetry(operation, errorMessage) {
|
||||
let lastError;
|
||||
for (let attempt = 1; attempt <= MAX_RETRIES; attempt++) {
|
||||
try {
|
||||
return await operation();
|
||||
} catch (error) {
|
||||
lastError = error;
|
||||
console.error(`${errorMessage} (Attempt ${attempt}/${MAX_RETRIES}):`, error);
|
||||
if (attempt < MAX_RETRIES) {
|
||||
const backoffTime = RETRY_DELAY * Math.pow(2, attempt - 1);
|
||||
await new Promise(resolve => setTimeout(resolve, backoffTime));
|
||||
}
|
||||
}
|
||||
}
|
||||
throw lastError;
|
||||
}
|
||||
```
|
||||
|
||||
### Progress Tracking
|
||||
|
||||
Progress is tracked with estimated time remaining:
|
||||
|
||||
```javascript
|
||||
function estimateRemaining(startTime, current, total) {
|
||||
if (current === 0) return "Calculating...";
|
||||
const elapsedSeconds = (Date.now() - startTime) / 1000;
|
||||
const itemsPerSecond = current / elapsedSeconds;
|
||||
const remainingItems = total - current;
|
||||
const remainingSeconds = remainingItems / itemsPerSecond;
|
||||
return formatElapsedTime(remainingSeconds);
|
||||
}
|
||||
```
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
### Transaction Management
|
||||
|
||||
All imports use transactions to ensure data consistency:
|
||||
|
||||
- **Categories**: Uses savepoints for each category type
|
||||
- **Products**: Uses a single transaction for the entire import
|
||||
- **Orders**: Uses a single transaction with temporary tables
|
||||
- **Purchase Orders**: Uses a single transaction with temporary tables
|
||||
|
||||
### Memory Usage Optimization
|
||||
|
||||
To minimize memory usage when processing large datasets:
|
||||
|
||||
1. Data is processed in batches (100-5000 records per batch)
|
||||
2. Temporary tables are used for intermediate data
|
||||
3. Some queries use cursors to avoid loading all results at once
|
||||
|
||||
### MySQL vs PostgreSQL Compatibility
|
||||
|
||||
The scripts handle differences between MySQL and PostgreSQL:
|
||||
|
||||
1. MySQL-specific syntax like `USE INDEX` is removed for PostgreSQL
|
||||
2. `GROUP_CONCAT` in MySQL becomes string operations in PostgreSQL
|
||||
3. Transaction syntax differences are abstracted in the connection wrapper
|
||||
4. PostgreSQL's `ON CONFLICT` replaces MySQL's `ON DUPLICATE KEY UPDATE`
|
||||
|
||||
### SSH Tunnel
|
||||
|
||||
Database connections go through an SSH tunnel for security:
|
||||
|
||||
```javascript
|
||||
ssh.forwardOut(
|
||||
"127.0.0.1",
|
||||
0,
|
||||
sshConfig.prodDbConfig.host,
|
||||
sshConfig.prodDbConfig.port,
|
||||
async (err, stream) => {
|
||||
if (err) reject(err);
|
||||
resolve({ ssh, stream });
|
||||
}
|
||||
);
|
||||
```
|
||||
Reference in New Issue
Block a user