Files
inventory/docs/import-from-prod-data-mapping.md
2025-03-25 19:12:41 -04:00

21 KiB

MySQL to PostgreSQL Import Process Documentation

This document outlines the data import process from the production MySQL database to the local PostgreSQL database, focusing on column mappings, data transformations, and the overall import architecture.

Table of Contents

  1. Overview
  2. Import Architecture
  3. Column Mappings
  4. Special Calculations
  5. Implementation Notes

Overview

The import process extracts data from a MySQL 5.7 production database and imports it into a PostgreSQL database. It can operate in two modes:

  • Full Import: Imports all data regardless of last sync time
  • Incremental Import: Only imports data that has changed since the last import

The process handles four main data types:

  • Categories (product categorization hierarchy)
  • Products (inventory items)
  • Orders (sales records)
  • Purchase Orders (vendor orders)

Import Architecture

The import process follows these steps:

  1. Establish Connection: Creates a SSH tunnel to the production server and establishes database connections
  2. Setup Import History: Creates a record of the current import operation
  3. Import Categories: Processes product categories in hierarchical order
  4. Import Products: Processes products with their attributes and category relationships
  5. Import Orders: Processes customer orders with line items, taxes, and discounts
  6. Import Purchase Orders: Processes vendor purchase orders with line items
  7. Record Results: Updates the import history with results
  8. Close Connections: Cleans up connections and resources

Each import step uses temporary tables for processing and wraps operations in transactions to ensure data consistency.

Column Mappings

Categories

PostgreSQL Column MySQL Source Transformation
cat_id product_categories.cat_id Direct mapping
name product_categories.name Direct mapping
type product_categories.type Direct mapping
parent_id product_categories.master_cat_id NULL for top-level categories (types 10, 20)
description product_categories.combined_name Direct mapping
status N/A Hard-coded 'active'
created_at N/A Current timestamp
updated_at N/A Current timestamp

Notes:

  • Categories are processed in hierarchical order by type: [10, 20, 11, 21, 12, 13]
  • Type 10/20 are top-level categories with no parent
  • Types 11/21/12/13 are child categories that reference parent categories

Products

PostgreSQL Column MySQL Source Transformation
pid products.pid Direct mapping
title products.description Direct mapping
description products.notes Direct mapping
sku products.itemnumber Fallback to 'NO-SKU' if empty
stock_quantity shop_inventory.available_local Capped at 5000, minimum 0
preorder_count current_inventory.onpreorder Default 0
notions_inv_count product_notions_b2b.inventory Default 0
price product_current_prices.price_each Default 0, filtered on active=1
regular_price products.sellingprice Default 0
cost_price product_inventory Weighted average: SUM(costeach * count) / SUM(count) when count > 0, or latest costeach
vendor suppliers.companyname Via supplier_item_data.supplier_id
vendor_reference supplier_item_data supplier_itemnumber or notions_itemnumber based on vendor
notions_reference supplier_item_data.notions_itemnumber Direct mapping
brand product_categories.name Linked via products.company
line product_categories.name Linked via products.line
subline product_categories.name Linked via products.subline
artist product_categories.name Linked via products.artist
categories product_category_index Comma-separated list of category IDs
created_at products.date_created Validated date, NULL if invalid
first_received products.datein Validated date, NULL if invalid
landing_cost_price NULL Not set
barcode products.upc Direct mapping
harmonized_tariff_code products.harmonized_tariff_code Direct mapping
updated_at products.stamp Validated date, NULL if invalid
visible shop_inventory Calculated from show + buyable > 0
managing_stock N/A Hard-coded true
replenishable Multiple fields Complex calculation based on reorder, dates, etc.
permalink N/A Constructed URL with product ID
moq supplier_item_data notions_qty_per_unit or supplier_qty_per_unit, minimum 1
uom N/A Hard-coded 1
rating products.rating Direct mapping
reviews products.rating_votes Direct mapping
weight products.weight Direct mapping
length products.length Direct mapping
width products.width Direct mapping
height products.height Direct mapping
country_of_origin products.country_of_origin Direct mapping
location products.location Direct mapping
total_sold order_items SUM(qty_ordered) for all order_items where prod_pid = pid
baskets mybasket COUNT of records where mb.item = pid and qty > 0
notifies product_notify COUNT of records where pn.pid = pid
date_last_sold product_last_sold.date_sold Validated date, NULL if invalid
image N/A Constructed from pid and image URL pattern
image_175 N/A Constructed from pid and image URL pattern
image_full N/A Constructed from pid and image URL pattern
options NULL Not set
tags NULL Not set

Notes:

  • Replenishable calculation:
    CASE 
      WHEN p.reorder < 0 THEN 0
      WHEN (
        (COALESCE(pls.date_sold, '0000-00-00') = '0000-00-00' OR pls.date_sold <= DATE_SUB(CURRENT_DATE, INTERVAL 5 YEAR))
        AND (p.datein = '0000-00-00 00:00:00' OR p.datein <= DATE_SUB(CURRENT_TIMESTAMP, INTERVAL 5 YEAR))
        AND (p.date_refill = '0000-00-00 00:00:00' OR p.date_refill <= DATE_SUB(CURRENT_TIMESTAMP, INTERVAL 5 YEAR))
      ) THEN 0
      ELSE 1
    END
    

In business terms, a product is considered NOT replenishable only if:

  • It was manually flagged as not replenishable (negative reorder value)
  • OR it shows no activity across ALL metrics (no sales AND no receipts AND no refills in the past 5 years)
  • Image URLs are constructed using this pattern:
    const paddedPid = pid.toString().padStart(6, '0');
    const prefix = paddedPid.slice(0, 3);
    const basePath = `${imageUrlBase}${prefix}/${pid}`;
    return {
      image: `${basePath}-t-${iid}.jpg`,
      image_175: `${basePath}-175x175-${iid}.jpg`,
      image_full: `${basePath}-o-${iid}.jpg`
    };
    

Product Categories (Relationship)

PostgreSQL Column MySQL Source Transformation
pid products.pid Direct mapping
cat_id product_category_index.cat_id Direct mapping, filtered by category types

Notes:

  • Only categories of types 10, 20, 11, 21, 12, 13 are imported
  • Categories 16 and 17 are explicitly excluded

Orders

PostgreSQL Column MySQL Source Transformation
order_number order_items.order_id Direct mapping
pid order_items.prod_pid Direct mapping
sku order_items.prod_itemnumber Fallback to 'NO-SKU' if empty
date _order.date_placed_onlydate Via join to _order table
price order_items.prod_price Direct mapping
quantity order_items.qty_ordered Direct mapping
discount Multiple sources Complex calculation (see notes)
tax order_tax_info_products.item_taxes_to_collect Via latest order_tax_info record
tax_included N/A Hard-coded false
shipping N/A Hard-coded 0
customer _order.order_cid Direct mapping
customer_name users CONCAT(users.firstname, ' ', users.lastname)
status _order.order_status Direct mapping
canceled _order.date_cancelled Boolean: true if date_cancelled is not '0000-00-00 00:00:00'
costeach order_costs From latest record or fallback to price * 0.5

Notes:

  • Only orders with order_status >= 15 and with a valid date_placed are processed
  • For incremental imports, only orders modified since last sync are processed
  • Discount calculation combines three sources:
    1. Base discount: order_items.prod_price_reg - order_items.prod_price
    2. Promo discount: SUM of order_discount_items.amount
    3. Proportional order discount: Calculation based on order subtotal proportion
    (oi.base_discount + 
     COALESCE(ot.promo_discount, 0) + 
     CASE 
      WHEN om.summary_discount > 0 AND om.summary_subtotal > 0 THEN 
        ROUND((om.summary_discount * (oi.price * oi.quantity)) / NULLIF(om.summary_subtotal, 0), 2)
      ELSE 0 
     END)::DECIMAL(10,2)
    
  • Taxes are taken from the latest tax record for an order
  • Cost data is taken from the latest non-pending cost record

Purchase Orders

PostgreSQL Column MySQL Source Transformation
po_id po.po_id Default 0 if NULL
pid po_products.pid Direct mapping
sku products.itemnumber Fallback to 'NO-SKU' if empty
name products.description Fallback to 'Unknown Product'
cost_price po_products.cost_each Direct mapping
po_cost_price po_products.cost_each Duplicate of cost_price
vendor suppliers.companyname Fallback to 'Unknown Vendor' if empty
date po.date_ordered Fallback to po.date_created if NULL
expected_date po.date_estin Direct mapping
status po.status Default 1 if NULL
notes po.short_note Fallback to po.notes if NULL
ordered po_products.qty_each Direct mapping
received N/A Hard-coded 0
receiving_status N/A Hard-coded 1

Notes:

  • Only POs created within last 1 year (incremental) or 5 years (full) are processed
  • For incremental imports, only POs modified since last sync are processed

Metadata Tables

import_history

PostgreSQL Column Source Notes
id Auto-increment Primary key
table_name Code 'all_tables' for overall import
start_time NOW() Import start time
end_time NOW() Import completion time
duration_seconds Calculation Elapsed seconds
is_incremental INCREMENTAL_UPDATE Flag from config
records_added Calculation Sum from all imports
records_updated Calculation Sum from all imports
status Code 'running', 'completed', 'failed', or 'cancelled'
error_message Exception Error message if failed
additional_info JSON Configuration and results

sync_status

PostgreSQL Column Source Notes
table_name Code Name of imported table
last_sync_timestamp NOW() Timestamp of successful sync
last_sync_id NULL Not used currently

Special Calculations

Date Validation

MySQL dates are validated before insertion into PostgreSQL:

function validateDate(mysqlDate) {
  if (!mysqlDate || mysqlDate === '0000-00-00' || mysqlDate === '0000-00-00 00:00:00') {
    return null;
  }
  // Check if the date is valid
  const date = new Date(mysqlDate);
  return isNaN(date.getTime()) ? null : mysqlDate;
}

Retry Mechanism

Operations that might fail temporarily are retried with exponential backoff:

async function withRetry(operation, errorMessage) {
  let lastError;
  for (let attempt = 1; attempt <= MAX_RETRIES; attempt++) {
    try {
      return await operation();
    } catch (error) {
      lastError = error;
      console.error(`${errorMessage} (Attempt ${attempt}/${MAX_RETRIES}):`, error);
      if (attempt < MAX_RETRIES) {
        const backoffTime = RETRY_DELAY * Math.pow(2, attempt - 1);
        await new Promise(resolve => setTimeout(resolve, backoffTime));
      }
    }
  }
  throw lastError;
}

Progress Tracking

Progress is tracked with estimated time remaining:

function estimateRemaining(startTime, current, total) {
  if (current === 0) return "Calculating...";
  const elapsedSeconds = (Date.now() - startTime) / 1000;
  const itemsPerSecond = current / elapsedSeconds;
  const remainingItems = total - current;
  const remainingSeconds = remainingItems / itemsPerSecond;
  return formatElapsedTime(remainingSeconds);
}

Implementation Notes

Transaction Management

All imports use transactions to ensure data consistency:

  • Categories: Uses savepoints for each category type
  • Products: Uses a single transaction for the entire import
  • Orders: Uses a single transaction with temporary tables
  • Purchase Orders: Uses a single transaction with temporary tables

Memory Usage Optimization

To minimize memory usage when processing large datasets:

  1. Data is processed in batches (100-5000 records per batch)
  2. Temporary tables are used for intermediate data
  3. Some queries use cursors to avoid loading all results at once

MySQL vs PostgreSQL Compatibility

The scripts handle differences between MySQL and PostgreSQL:

  1. MySQL-specific syntax like USE INDEX is removed for PostgreSQL
  2. GROUP_CONCAT in MySQL becomes string operations in PostgreSQL
  3. Transaction syntax differences are abstracted in the connection wrapper
  4. PostgreSQL's ON CONFLICT replaces MySQL's ON DUPLICATE KEY UPDATE

SSH Tunnel

Database connections go through an SSH tunnel for security:

ssh.forwardOut(
  "127.0.0.1",
  0,
  sshConfig.prodDbConfig.host,
  sshConfig.prodDbConfig.port,
  async (err, stream) => {
    if (err) reject(err);
    resolve({ ssh, stream });
  }
);