2847 lines
80 KiB
Markdown
2847 lines
80 KiB
Markdown
# AI Validation System Redesign
|
||
|
||
> **Status:** Design Document (not yet implemented)
|
||
> **Date:** January 2025
|
||
> **Goal:** Replace expensive batch AI validation with a multi-tier, real-time validation system
|
||
|
||
## Table of Contents
|
||
|
||
1. [Executive Summary](#executive-summary)
|
||
2. [Current State Analysis](#current-state-analysis)
|
||
3. [Proposed Architecture](#proposed-architecture)
|
||
4. [Backend Implementation](#backend-implementation)
|
||
- [File Structure](#file-structure)
|
||
- [Provider System](#provider-system)
|
||
- [Task Registry](#task-registry)
|
||
- [Embedding System](#embedding-system)
|
||
- [Micro-Prompt Tasks](#micro-prompt-tasks)
|
||
5. [Frontend Implementation](#frontend-implementation)
|
||
- [Suggestion Hooks](#suggestion-hooks)
|
||
- [UI Components](#ui-components)
|
||
- [Integration with ValidationStep](#integration-with-validationstep)
|
||
6. [Database Schema](#database-schema)
|
||
7. [Migration Strategy](#migration-strategy)
|
||
8. [Cost Analysis](#cost-analysis)
|
||
|
||
---
|
||
|
||
## Executive Summary
|
||
|
||
### Problem
|
||
The current AI validation system:
|
||
- Costs **$0.30-0.50 per run** (GPT-5.2 with reasoning)
|
||
- Takes **3-4 minutes** to process
|
||
- Sends a **giant prompt** (18KB+ instructions + full taxonomy + all products)
|
||
- Processes everything in **one batch at the end** of the workflow
|
||
|
||
### Solution
|
||
A multi-tier validation system that:
|
||
- Performs **deterministic validation in code** (no AI needed for formatting)
|
||
- Uses **Groq (Llama 3.3)** for real-time field suggestions (~200ms, ~$0.001/product)
|
||
- Pre-computes **category/theme embeddings** for fast similarity search
|
||
- Reserves **expensive models** only for complex description generation
|
||
- Validates **incrementally** as user works, not all at once
|
||
|
||
### Expected Outcomes
|
||
| Metric | Current | Proposed |
|
||
|--------|---------|----------|
|
||
| Cost per 50 products | $0.30-0.50 | $0.02-0.05 |
|
||
| Processing time | 3-4 minutes | Real-time + 10-30s batch |
|
||
| User experience | Wait at end | Inline suggestions |
|
||
|
||
---
|
||
|
||
## Current State Analysis
|
||
|
||
### What the Current System Does
|
||
|
||
Based on the general prompt in `ai_prompts` table, validation performs these tasks:
|
||
|
||
| Task | AI Required? | Proposed Approach |
|
||
|------|--------------|-------------------|
|
||
| Price formatting (`$5.00` → `5.00`) | ❌ No | JavaScript normalizer |
|
||
| UPC/SKU trimming | ❌ No | JavaScript normalizer |
|
||
| Country code conversion (`USA` → `US`) | ❌ No | Lookup table |
|
||
| Date formatting (ETA field) | ❌ No | Date parser |
|
||
| Category assignment | ⚠️ Partial | Embeddings + small model |
|
||
| Theme detection | ⚠️ Partial | Embeddings + small model |
|
||
| Color extraction | ⚠️ Partial | Small model |
|
||
| Name standardization | ✅ Yes | Medium model |
|
||
| Description enhancement | ✅ Yes | Medium model (streaming) |
|
||
| Weight/dimension consistency | ✅ Yes | Batch comparison |
|
||
| Tax code assignment | ⚠️ Partial | Rules + small model |
|
||
|
||
### Current Prompt Structure
|
||
|
||
```
|
||
[System Instructions] ~200 chars
|
||
[General Prompt] ~18,000 chars (field guidelines, naming rules, etc.)
|
||
[Company-Specific] ~variable
|
||
[Taxonomy Data] ~50,000-100,000 chars (categories, themes, colors, etc.)
|
||
[Product Data] ~variable (all products as JSON)
|
||
```
|
||
|
||
**Total prompt size:** Often 100,000+ characters = ~25,000+ tokens input
|
||
|
||
### Current Model Usage
|
||
|
||
- **Model:** GPT-5.2 (reasoning model)
|
||
- **Reasoning effort:** Medium
|
||
- **Max output tokens:** 50,000
|
||
- **Response format:** Strict JSON schema
|
||
|
||
---
|
||
|
||
## Proposed Architecture
|
||
|
||
### System Overview
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||
│ VALIDATION STEP UI │
|
||
├─────────────────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
|
||
│ │ Field Edit │ │ Row Navigation │ │ Batch Action │ │
|
||
│ │ (on blur) │ │ (on row change)│ │ (user clicks) │ │
|
||
│ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │
|
||
│ │ │ │ │
|
||
└───────────┼─────────────────────┼─────────────────────┼─────────────────────┘
|
||
│ │ │
|
||
▼ ▼ ▼
|
||
┌───────────────────────────────────────────────────────────────────────────┐
|
||
│ TIER 1: CLIENT-SIDE │
|
||
│ (Deterministic - No AI) │
|
||
├───────────────────────────────────────────────────────────────────────────┤
|
||
│ • Price formatting • UPC trimming • Country codes │
|
||
│ • Numeric validation • Required fields • Date parsing │
|
||
└───────────────────────────────────────────────────────────────────────────┘
|
||
│ │ │
|
||
▼ ▼ ▼
|
||
┌───────────────────────────────────────────────────────────────────────────┐
|
||
│ TIER 2: REAL-TIME AI SUGGESTIONS │
|
||
│ (Groq - Llama 3.3 70B) │
|
||
├───────────────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ POST /api/ai/suggest/:field │
|
||
│ │
|
||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||
│ │ Name │ │ Category │ │ Theme │ │ Color │ │ Tax Code │ │
|
||
│ │ Task │ │ Task │ │ Task │ │ Task │ │ Task │ │
|
||
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
|
||
│ │ │ │ │ │ │
|
||
│ │ ▼ │ │ │ │
|
||
│ │ ┌────────────────┐ │ │ │ │
|
||
│ │ │ Embedding │ │ │ │ │
|
||
│ │ │ Pre-Filter │ │ │ │ │
|
||
│ │ └────────────────┘ │ │ │ │
|
||
│ │ │ │ │ │ │
|
||
└───────┼──────────────┼─────────────┼─────────────┼─────────────┼──────────┘
|
||
│ │ │ │ │
|
||
▼ ▼ ▼ ▼ ▼
|
||
┌───────────────────────────────────────────────────────────────────────────┐
|
||
│ TIER 3: BATCH AI ENHANCEMENT │
|
||
│ (Claude Haiku / GPT-4o-mini) │
|
||
├───────────────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ POST /api/ai/enhance/descriptions │
|
||
│ POST /api/ai/validate/consistency │
|
||
│ │
|
||
│ ┌────────────────────┐ ┌────────────────────┐ │
|
||
│ │ Description │ │ Cross-Product │ │
|
||
│ │ Generation │ │ Consistency │ │
|
||
│ │ (10-20 at once) │ │ (weights/dims) │ │
|
||
│ └────────────────────┘ └────────────────────┘ │
|
||
│ │
|
||
└───────────────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### Validation Flow
|
||
|
||
```
|
||
User enters/maps data
|
||
│
|
||
▼
|
||
┌───────────────────┐
|
||
│ Tier 1: Code │ ◄── Instant (0ms)
|
||
│ - Format prices │
|
||
│ - Validate UPCs │
|
||
│ - Convert country │
|
||
└─────────┬─────────┘
|
||
│
|
||
▼
|
||
┌───────────────────┐
|
||
│ Tier 2: Groq │ ◄── Fast (200-500ms per field)
|
||
│ - Suggest category│ Triggered on focus/blur
|
||
│ - Detect themes │
|
||
│ - Format name │
|
||
└─────────┬─────────┘
|
||
│
|
||
▼
|
||
┌───────────────────┐
|
||
│ Tier 3: Batch │ ◄── On-demand (5-30s)
|
||
│ - Descriptions │ User clicks "Enhance"
|
||
│ - Consistency │
|
||
└───────────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## Backend Implementation
|
||
|
||
### File Structure
|
||
|
||
```
|
||
inventory-server/src/
|
||
├── services/
|
||
│ └── ai/
|
||
│ ├── index.js # Main entry, initialization, exports
|
||
│ ├── config.js # AI configuration management
|
||
│ ├── taskRegistry.js # Task registration system
|
||
│ ├── workQueue.js # Concurrency control
|
||
│ │
|
||
│ ├── providers/
|
||
│ │ ├── index.js # Provider factory
|
||
│ │ ├── groqProvider.js # Groq API client (chat)
|
||
│ │ ├── openaiProvider.js # OpenAI API client (embeddings + chat)
|
||
│ │ └── anthropicProvider.js # Claude API client (batch tasks)
|
||
│ │
|
||
│ ├── embeddings/
|
||
│ │ ├── index.js # Embedding service entry
|
||
│ │ ├── categoryEmbeddings.js # Category embedding management
|
||
│ │ ├── themeEmbeddings.js # Theme embedding management
|
||
│ │ ├── vectorStore.js # In-memory vector storage
|
||
│ │ └── similarity.js # Cosine similarity utilities
|
||
│ │
|
||
│ ├── tasks/
|
||
│ │ ├── index.js # Task exports
|
||
│ │ ├── nameSuggestionTask.js
|
||
│ │ ├── categorySuggestionTask.js
|
||
│ │ ├── themeSuggestionTask.js
|
||
│ │ ├── colorSuggestionTask.js
|
||
│ │ ├── taxCodeSuggestionTask.js
|
||
│ │ ├── descriptionEnhanceTask.js
|
||
│ │ ├── consistencyCheckTask.js
|
||
│ │ └── utils/
|
||
│ │ ├── productUtils.js # Product data helpers
|
||
│ │ └── responseParser.js # AI response parsing
|
||
│ │
|
||
│ ├── prompts/
|
||
│ │ ├── index.js # Prompt exports
|
||
│ │ ├── namePrompts.js
|
||
│ │ ├── categoryPrompts.js
|
||
│ │ ├── themePrompts.js
|
||
│ │ ├── colorPrompts.js
|
||
│ │ ├── descriptionPrompts.js
|
||
│ │ └── consistencyPrompts.js
|
||
│ │
|
||
│ └── normalizers/
|
||
│ ├── index.js # Normalizer exports
|
||
│ ├── priceNormalizer.js
|
||
│ ├── upcNormalizer.js
|
||
│ ├── countryCodeNormalizer.js
|
||
│ ├── dateNormalizer.js
|
||
│ └── numericNormalizer.js
|
||
│
|
||
└── routes/
|
||
└── ai.js # New AI routes (replaces ai-validation.js)
|
||
```
|
||
|
||
### Provider System
|
||
|
||
#### Provider Interface
|
||
|
||
```javascript
|
||
// services/ai/providers/index.js
|
||
|
||
/**
|
||
* All providers must implement this interface
|
||
*/
|
||
class AIProvider {
|
||
/**
|
||
* Chat completion
|
||
* @param {Object} params
|
||
* @param {Array<{role: string, content: string}>} params.messages
|
||
* @param {string} params.model
|
||
* @param {number} [params.temperature=0.3]
|
||
* @param {number} [params.maxTokens=500]
|
||
* @param {Object} [params.responseFormat] - JSON schema for structured output
|
||
* @param {number} [params.timeoutMs=30000]
|
||
* @returns {Promise<{content: string, parsed: Object|null, usage: Object, latencyMs: number, model: string}>}
|
||
*/
|
||
async chatCompletion(params) {
|
||
throw new Error('Not implemented');
|
||
}
|
||
|
||
/**
|
||
* Generate embeddings
|
||
* @param {string|string[]} input - Text or array of texts
|
||
* @param {Object} [options]
|
||
* @param {string} [options.model]
|
||
* @param {number} [options.dimensions]
|
||
* @returns {Promise<{embeddings: number[][], usage: Object, model: string, latencyMs: number}>}
|
||
*/
|
||
async embed(input, options) {
|
||
throw new Error('Not implemented');
|
||
}
|
||
}
|
||
|
||
/**
|
||
* Provider factory
|
||
*/
|
||
function createProvider(providerName, config) {
|
||
switch (providerName.toLowerCase()) {
|
||
case 'groq':
|
||
return new GroqProvider(config.providers.groq);
|
||
case 'openai':
|
||
return new OpenAIProvider(config.providers.openai);
|
||
case 'anthropic':
|
||
return new AnthropicProvider(config.providers.anthropic);
|
||
default:
|
||
throw new Error(`Unknown provider: ${providerName}`);
|
||
}
|
||
}
|
||
|
||
module.exports = { AIProvider, createProvider };
|
||
```
|
||
|
||
#### Groq Provider (Primary for Real-Time)
|
||
|
||
```javascript
|
||
// services/ai/providers/groqProvider.js
|
||
|
||
const Groq = require('groq-sdk');
|
||
|
||
class GroqProvider {
|
||
constructor({ apiKey, baseUrl, timeoutMs = 30000 }) {
|
||
if (!apiKey) {
|
||
throw new Error('Groq API key is required');
|
||
}
|
||
this.client = new Groq({ apiKey, baseURL: baseUrl });
|
||
this.timeoutMs = timeoutMs;
|
||
}
|
||
|
||
async chatCompletion({
|
||
messages,
|
||
model = 'llama-3.3-70b-versatile',
|
||
temperature = 0.3,
|
||
maxTokens = 500,
|
||
responseFormat = null,
|
||
timeoutMs = this.timeoutMs
|
||
}) {
|
||
const started = Date.now();
|
||
|
||
const params = {
|
||
messages,
|
||
model,
|
||
temperature,
|
||
max_tokens: maxTokens
|
||
};
|
||
|
||
// Add JSON mode if requested
|
||
if (responseFormat) {
|
||
params.response_format = { type: 'json_object' };
|
||
}
|
||
|
||
const response = await this.client.chat.completions.create(params, {
|
||
timeout: timeoutMs
|
||
});
|
||
|
||
const content = response.choices[0]?.message?.content || '';
|
||
const usage = response.usage || {};
|
||
|
||
// Try to parse JSON if response format was requested
|
||
let parsed = null;
|
||
if (responseFormat && content) {
|
||
try {
|
||
parsed = JSON.parse(content);
|
||
} catch {
|
||
// Will return raw content
|
||
}
|
||
}
|
||
|
||
return {
|
||
content,
|
||
parsed,
|
||
usage: {
|
||
promptTokens: usage.prompt_tokens || 0,
|
||
completionTokens: usage.completion_tokens || 0,
|
||
totalTokens: usage.total_tokens || 0
|
||
},
|
||
latencyMs: Date.now() - started,
|
||
model: response.model || model
|
||
};
|
||
}
|
||
|
||
// Groq doesn't support embeddings, so this throws
|
||
async embed() {
|
||
throw new Error('Groq does not support embeddings. Use OpenAI provider.');
|
||
}
|
||
}
|
||
|
||
module.exports = { GroqProvider };
|
||
```
|
||
|
||
#### OpenAI Provider (Embeddings + Fallback Chat)
|
||
|
||
```javascript
|
||
// services/ai/providers/openaiProvider.js
|
||
|
||
const MAX_EMBEDDING_BATCH_SIZE = 2048;
|
||
|
||
class OpenAIProvider {
|
||
constructor({
|
||
apiKey,
|
||
baseUrl = 'https://api.openai.com/v1',
|
||
embeddingModel = 'text-embedding-3-small',
|
||
embeddingDimensions = 1536,
|
||
chatModel = 'gpt-4o-mini',
|
||
timeoutMs = 60000
|
||
}) {
|
||
if (!apiKey) {
|
||
throw new Error('OpenAI API key is required');
|
||
}
|
||
this.apiKey = apiKey;
|
||
this.baseUrl = baseUrl;
|
||
this.embeddingModel = embeddingModel;
|
||
this.embeddingDimensions = embeddingDimensions;
|
||
this.chatModel = chatModel;
|
||
this.timeoutMs = timeoutMs;
|
||
}
|
||
|
||
async chatCompletion({
|
||
messages,
|
||
model = this.chatModel,
|
||
temperature = 0.3,
|
||
maxTokens = 500,
|
||
responseFormat = null,
|
||
timeoutMs = this.timeoutMs
|
||
}) {
|
||
const started = Date.now();
|
||
|
||
const body = {
|
||
model,
|
||
messages,
|
||
temperature,
|
||
max_tokens: maxTokens
|
||
};
|
||
|
||
if (responseFormat) {
|
||
body.response_format = { type: 'json_object' };
|
||
}
|
||
|
||
const response = await this._makeRequest('chat/completions', body, timeoutMs);
|
||
const content = response.choices[0]?.message?.content || '';
|
||
const usage = response.usage || {};
|
||
|
||
let parsed = null;
|
||
if (responseFormat && content) {
|
||
try {
|
||
parsed = JSON.parse(content);
|
||
} catch {
|
||
// Will return raw content
|
||
}
|
||
}
|
||
|
||
return {
|
||
content,
|
||
parsed,
|
||
usage: {
|
||
promptTokens: usage.prompt_tokens || 0,
|
||
completionTokens: usage.completion_tokens || 0,
|
||
totalTokens: usage.total_tokens || 0
|
||
},
|
||
latencyMs: Date.now() - started,
|
||
model: response.model || model
|
||
};
|
||
}
|
||
|
||
/**
|
||
* Generate embeddings for a single text or batch
|
||
*/
|
||
async embed(input, options = {}) {
|
||
const texts = Array.isArray(input) ? input : [input];
|
||
const model = options.model || this.embeddingModel;
|
||
const dimensions = options.dimensions || this.embeddingDimensions;
|
||
const timeoutMs = options.timeoutMs || this.timeoutMs;
|
||
|
||
if (texts.length > MAX_EMBEDDING_BATCH_SIZE) {
|
||
throw new Error(`Batch size ${texts.length} exceeds max of ${MAX_EMBEDDING_BATCH_SIZE}`);
|
||
}
|
||
|
||
const started = Date.now();
|
||
|
||
// Clean input texts
|
||
const cleanedTexts = texts.map(t =>
|
||
(t || '').replace(/\n+/g, ' ').trim().substring(0, 8000)
|
||
);
|
||
|
||
const body = {
|
||
input: cleanedTexts,
|
||
model,
|
||
encoding_format: 'float'
|
||
};
|
||
|
||
// Only embedding-3 models support dimensions parameter
|
||
if (model.includes('embedding-3')) {
|
||
body.dimensions = dimensions;
|
||
}
|
||
|
||
const response = await this._makeRequest('embeddings', body, timeoutMs);
|
||
|
||
// Sort by index to ensure order matches input
|
||
const sortedData = response.data.sort((a, b) => a.index - b.index);
|
||
const embeddings = sortedData.map(item => item.embedding);
|
||
|
||
return {
|
||
embeddings,
|
||
usage: {
|
||
promptTokens: response.usage?.prompt_tokens || 0,
|
||
totalTokens: response.usage?.total_tokens || 0
|
||
},
|
||
model: response.model || model,
|
||
latencyMs: Date.now() - started
|
||
};
|
||
}
|
||
|
||
/**
|
||
* Generator for processing large batches in chunks
|
||
*/
|
||
async *embedBatchChunked(texts, options = {}) {
|
||
const batchSize = Math.min(options.batchSize || 100, MAX_EMBEDDING_BATCH_SIZE);
|
||
|
||
for (let i = 0; i < texts.length; i += batchSize) {
|
||
const chunk = texts.slice(i, i + batchSize);
|
||
const result = await this.embed(chunk, options);
|
||
|
||
yield {
|
||
embeddings: result.embeddings,
|
||
startIndex: i,
|
||
endIndex: i + chunk.length,
|
||
usage: result.usage,
|
||
model: result.model,
|
||
latencyMs: result.latencyMs
|
||
};
|
||
}
|
||
}
|
||
|
||
async _makeRequest(endpoint, body, timeoutMs) {
|
||
const controller = new AbortController();
|
||
const timeout = setTimeout(() => controller.abort(), timeoutMs);
|
||
|
||
try {
|
||
const response = await fetch(`${this.baseUrl}/${endpoint}`, {
|
||
method: 'POST',
|
||
headers: {
|
||
'Content-Type': 'application/json',
|
||
'Authorization': `Bearer ${this.apiKey}`
|
||
},
|
||
body: JSON.stringify(body),
|
||
signal: controller.signal
|
||
});
|
||
|
||
if (!response.ok) {
|
||
const error = await response.json().catch(() => ({}));
|
||
throw new Error(error.error?.message || `OpenAI API error: ${response.status}`);
|
||
}
|
||
|
||
return response.json();
|
||
} finally {
|
||
clearTimeout(timeout);
|
||
}
|
||
}
|
||
}
|
||
|
||
module.exports = { OpenAIProvider, MAX_EMBEDDING_BATCH_SIZE };
|
||
```
|
||
|
||
#### Anthropic Provider (Batch Enhancement)
|
||
|
||
```javascript
|
||
// services/ai/providers/anthropicProvider.js
|
||
|
||
const Anthropic = require('@anthropic-ai/sdk');
|
||
|
||
class AnthropicProvider {
|
||
constructor({
|
||
apiKey,
|
||
defaultModel = 'claude-3-5-haiku-20241022',
|
||
timeoutMs = 120000
|
||
}) {
|
||
if (!apiKey) {
|
||
throw new Error('Anthropic API key is required');
|
||
}
|
||
this.client = new Anthropic({ apiKey });
|
||
this.defaultModel = defaultModel;
|
||
this.timeoutMs = timeoutMs;
|
||
}
|
||
|
||
async chatCompletion({
|
||
messages,
|
||
model = this.defaultModel,
|
||
temperature = 0.3,
|
||
maxTokens = 1000,
|
||
system = null,
|
||
timeoutMs = this.timeoutMs
|
||
}) {
|
||
const started = Date.now();
|
||
|
||
// Anthropic uses separate system parameter
|
||
const params = {
|
||
model,
|
||
max_tokens: maxTokens,
|
||
temperature,
|
||
messages: messages.filter(m => m.role !== 'system')
|
||
};
|
||
|
||
// Extract system message if present
|
||
const systemMessage = system || messages.find(m => m.role === 'system')?.content;
|
||
if (systemMessage) {
|
||
params.system = systemMessage;
|
||
}
|
||
|
||
const response = await this.client.messages.create(params);
|
||
|
||
const content = response.content
|
||
.filter(block => block.type === 'text')
|
||
.map(block => block.text)
|
||
.join('');
|
||
|
||
let parsed = null;
|
||
try {
|
||
parsed = JSON.parse(content);
|
||
} catch {
|
||
// Not JSON
|
||
}
|
||
|
||
return {
|
||
content,
|
||
parsed,
|
||
usage: {
|
||
promptTokens: response.usage?.input_tokens || 0,
|
||
completionTokens: response.usage?.output_tokens || 0,
|
||
totalTokens: (response.usage?.input_tokens || 0) + (response.usage?.output_tokens || 0)
|
||
},
|
||
latencyMs: Date.now() - started,
|
||
model: response.model || model
|
||
};
|
||
}
|
||
|
||
async embed() {
|
||
throw new Error('Anthropic does not support embeddings. Use OpenAI provider.');
|
||
}
|
||
}
|
||
|
||
module.exports = { AnthropicProvider };
|
||
```
|
||
|
||
### Task Registry
|
||
|
||
```javascript
|
||
// services/ai/taskRegistry.js
|
||
|
||
/**
|
||
* Registry for AI tasks
|
||
* Manages task registration, lookup, and execution
|
||
*/
|
||
class AiTaskRegistry {
|
||
constructor() {
|
||
this.tasks = new Map();
|
||
}
|
||
|
||
/**
|
||
* Register a task
|
||
* @param {Object} taskDefinition
|
||
* @param {string} taskDefinition.id - Unique task identifier
|
||
* @param {string} taskDefinition.description - Human-readable description
|
||
* @param {Function} taskDefinition.run - Async function to execute
|
||
* @param {Object} [taskDefinition.config] - Task-specific configuration
|
||
*/
|
||
register(taskDefinition) {
|
||
if (!taskDefinition?.id) {
|
||
throw new Error('Task must have an id');
|
||
}
|
||
if (typeof taskDefinition.run !== 'function') {
|
||
throw new Error(`Task ${taskDefinition.id} must have a run function`);
|
||
}
|
||
if (this.tasks.has(taskDefinition.id)) {
|
||
throw new Error(`Task ${taskDefinition.id} is already registered`);
|
||
}
|
||
this.tasks.set(taskDefinition.id, taskDefinition);
|
||
return this;
|
||
}
|
||
|
||
/**
|
||
* Get a task by ID
|
||
*/
|
||
get(taskId) {
|
||
return this.tasks.get(taskId) || null;
|
||
}
|
||
|
||
/**
|
||
* Check if a task exists
|
||
*/
|
||
has(taskId) {
|
||
return this.tasks.has(taskId);
|
||
}
|
||
|
||
/**
|
||
* List all registered task IDs
|
||
*/
|
||
list() {
|
||
return Array.from(this.tasks.keys());
|
||
}
|
||
}
|
||
|
||
/**
|
||
* Task IDs as frozen constants
|
||
*/
|
||
const TASK_IDS = Object.freeze({
|
||
// Tier 2: Real-time suggestions
|
||
SUGGEST_NAME: 'suggest.name',
|
||
SUGGEST_CATEGORIES: 'suggest.categories',
|
||
SUGGEST_THEMES: 'suggest.themes',
|
||
SUGGEST_COLORS: 'suggest.colors',
|
||
SUGGEST_TAX_CODE: 'suggest.taxCode',
|
||
SUGGEST_SIZE_CATEGORY: 'suggest.sizeCategory',
|
||
|
||
// Tier 3: Batch enhancement
|
||
ENHANCE_DESCRIPTIONS: 'enhance.descriptions',
|
||
CHECK_CONSISTENCY: 'check.consistency',
|
||
|
||
// Utility tasks
|
||
COMPUTE_EMBEDDINGS: 'util.computeEmbeddings'
|
||
});
|
||
|
||
module.exports = { AiTaskRegistry, TASK_IDS };
|
||
```
|
||
|
||
### Work Queue (Concurrency Control)
|
||
|
||
```javascript
|
||
// services/ai/workQueue.js
|
||
|
||
/**
|
||
* Simple concurrent work queue
|
||
* Prevents overwhelming AI providers with too many parallel requests
|
||
*/
|
||
class AiWorkQueue {
|
||
constructor(concurrency = 3) {
|
||
this.concurrency = Math.max(1, concurrency);
|
||
this.active = 0;
|
||
this.queue = [];
|
||
}
|
||
|
||
/**
|
||
* Enqueue a task for execution
|
||
* @param {Function} taskFactory - Async function to execute
|
||
* @returns {Promise} - Resolves with task result
|
||
*/
|
||
enqueue(taskFactory) {
|
||
return new Promise((resolve, reject) => {
|
||
const execute = async () => {
|
||
this.active += 1;
|
||
try {
|
||
const result = await taskFactory();
|
||
resolve(result);
|
||
} catch (error) {
|
||
reject(error);
|
||
} finally {
|
||
this.active -= 1;
|
||
this._processNext();
|
||
}
|
||
};
|
||
|
||
if (this.active < this.concurrency) {
|
||
execute();
|
||
} else {
|
||
this.queue.push(execute);
|
||
}
|
||
});
|
||
}
|
||
|
||
_processNext() {
|
||
if (this.queue.length === 0 || this.active >= this.concurrency) {
|
||
return;
|
||
}
|
||
const next = this.queue.shift();
|
||
if (next) {
|
||
next();
|
||
}
|
||
}
|
||
|
||
/**
|
||
* Get current queue statistics
|
||
*/
|
||
getStats() {
|
||
return {
|
||
active: this.active,
|
||
queued: this.queue.length,
|
||
concurrency: this.concurrency
|
||
};
|
||
}
|
||
}
|
||
|
||
module.exports = { AiWorkQueue };
|
||
```
|
||
|
||
### Embedding System
|
||
|
||
#### Embedding Service Entry
|
||
|
||
```javascript
|
||
// services/ai/embeddings/index.js
|
||
|
||
const { CategoryEmbeddings } = require('./categoryEmbeddings');
|
||
const { ThemeEmbeddings } = require('./themeEmbeddings');
|
||
const { VectorStore } = require('./vectorStore');
|
||
|
||
let categoryEmbeddings = null;
|
||
let themeEmbeddings = null;
|
||
let initialized = false;
|
||
|
||
/**
|
||
* Initialize the embedding system
|
||
* Should be called once at server startup
|
||
*/
|
||
async function initializeEmbeddings({ openaiProvider, mysqlConnection, logger }) {
|
||
if (initialized) {
|
||
return { categoryEmbeddings, themeEmbeddings };
|
||
}
|
||
|
||
logger?.info('[Embeddings] Initializing embedding system...');
|
||
|
||
// Initialize category embeddings
|
||
categoryEmbeddings = new CategoryEmbeddings({
|
||
provider: openaiProvider,
|
||
connection: mysqlConnection,
|
||
logger
|
||
});
|
||
|
||
// Initialize theme embeddings
|
||
themeEmbeddings = new ThemeEmbeddings({
|
||
provider: openaiProvider,
|
||
connection: mysqlConnection,
|
||
logger
|
||
});
|
||
|
||
// Load or compute embeddings
|
||
await Promise.all([
|
||
categoryEmbeddings.initialize(),
|
||
themeEmbeddings.initialize()
|
||
]);
|
||
|
||
initialized = true;
|
||
logger?.info('[Embeddings] Embedding system initialized');
|
||
|
||
return { categoryEmbeddings, themeEmbeddings };
|
||
}
|
||
|
||
/**
|
||
* Get category suggestions for a product
|
||
*/
|
||
async function suggestCategories(productText, topK = 10) {
|
||
if (!categoryEmbeddings) {
|
||
throw new Error('Embeddings not initialized');
|
||
}
|
||
return categoryEmbeddings.findSimilar(productText, topK);
|
||
}
|
||
|
||
/**
|
||
* Get theme suggestions for a product
|
||
*/
|
||
async function suggestThemes(productText, topK = 5) {
|
||
if (!themeEmbeddings) {
|
||
throw new Error('Embeddings not initialized');
|
||
}
|
||
return themeEmbeddings.findSimilar(productText, topK);
|
||
}
|
||
|
||
module.exports = {
|
||
initializeEmbeddings,
|
||
suggestCategories,
|
||
suggestThemes
|
||
};
|
||
```
|
||
|
||
#### Category Embeddings
|
||
|
||
```javascript
|
||
// services/ai/embeddings/categoryEmbeddings.js
|
||
|
||
const { VectorStore } = require('./vectorStore');
|
||
const { cosineSimilarity } = require('./similarity');
|
||
|
||
class CategoryEmbeddings {
|
||
constructor({ provider, connection, logger }) {
|
||
this.provider = provider;
|
||
this.connection = connection;
|
||
this.logger = logger;
|
||
this.vectorStore = new VectorStore();
|
||
this.categories = []; // Raw category data
|
||
}
|
||
|
||
/**
|
||
* Initialize embeddings - load from cache or compute
|
||
*/
|
||
async initialize() {
|
||
this.logger?.info('[CategoryEmbeddings] Loading categories from database...');
|
||
|
||
// Fetch hierarchical categories
|
||
const [rows] = await this.connection.query(`
|
||
SELECT
|
||
cat_id,
|
||
name,
|
||
master_cat_id,
|
||
type
|
||
FROM product_categories
|
||
WHERE type IN (10, 11, 12, 13)
|
||
ORDER BY type, name
|
||
`);
|
||
|
||
// Build category paths (e.g., "Paper > Patterned Paper > 12x12 Single Sheets")
|
||
this.categories = this._buildCategoryPaths(rows);
|
||
this.logger?.info(`[CategoryEmbeddings] Built ${this.categories.length} category paths`);
|
||
|
||
// Check if we have cached embeddings
|
||
const cached = await this._loadCachedEmbeddings();
|
||
if (cached && cached.length === this.categories.length) {
|
||
this.logger?.info('[CategoryEmbeddings] Using cached embeddings');
|
||
this.vectorStore.load(cached);
|
||
return;
|
||
}
|
||
|
||
// Compute new embeddings
|
||
await this._computeAndCacheEmbeddings();
|
||
}
|
||
|
||
/**
|
||
* Find similar categories for a product
|
||
*/
|
||
async findSimilar(productText, topK = 10) {
|
||
// Get embedding for product
|
||
const { embeddings } = await this.provider.embed(productText);
|
||
const productEmbedding = embeddings[0];
|
||
|
||
// Find most similar categories
|
||
const results = this.vectorStore.search(productEmbedding, topK);
|
||
|
||
// Enrich with category data
|
||
return results.map(result => {
|
||
const category = this.categories.find(c => c.id === result.id);
|
||
return {
|
||
id: category.id,
|
||
name: category.name,
|
||
fullPath: category.fullPath,
|
||
parentId: category.parentId,
|
||
similarity: result.similarity
|
||
};
|
||
});
|
||
}
|
||
|
||
/**
|
||
* Build full paths for categories
|
||
*/
|
||
_buildCategoryPaths(rows) {
|
||
const byId = new Map(rows.map(r => [r.cat_id, r]));
|
||
const categories = [];
|
||
|
||
for (const row of rows) {
|
||
const path = [];
|
||
let current = row;
|
||
|
||
// Walk up the tree to build full path
|
||
while (current) {
|
||
path.unshift(current.name);
|
||
current = current.master_cat_id ? byId.get(current.master_cat_id) : null;
|
||
}
|
||
|
||
categories.push({
|
||
id: row.cat_id,
|
||
name: row.name,
|
||
parentId: row.master_cat_id,
|
||
type: row.type,
|
||
fullPath: path.join(' > '),
|
||
// Text for embedding includes path for context
|
||
embeddingText: path.join(' ')
|
||
});
|
||
}
|
||
|
||
return categories;
|
||
}
|
||
|
||
/**
|
||
* Compute embeddings for all categories
|
||
*/
|
||
async _computeAndCacheEmbeddings() {
|
||
this.logger?.info('[CategoryEmbeddings] Computing embeddings...');
|
||
|
||
const texts = this.categories.map(c => c.embeddingText);
|
||
const allEmbeddings = [];
|
||
|
||
// Process in chunks (OpenAI has batch limits)
|
||
for await (const chunk of this.provider.embedBatchChunked(texts, { batchSize: 100 })) {
|
||
for (let i = 0; i < chunk.embeddings.length; i++) {
|
||
const globalIndex = chunk.startIndex + i;
|
||
allEmbeddings.push({
|
||
id: this.categories[globalIndex].id,
|
||
embedding: chunk.embeddings[i]
|
||
});
|
||
}
|
||
this.logger?.info(`[CategoryEmbeddings] Processed ${chunk.endIndex}/${texts.length}`);
|
||
}
|
||
|
||
// Store in vector store
|
||
this.vectorStore.load(allEmbeddings);
|
||
|
||
// Cache to database for faster startup next time
|
||
await this._cacheEmbeddings(allEmbeddings);
|
||
|
||
this.logger?.info('[CategoryEmbeddings] Embeddings computed and cached');
|
||
}
|
||
|
||
async _loadCachedEmbeddings() {
|
||
// TODO: Load from ai_embeddings_cache table
|
||
return null;
|
||
}
|
||
|
||
async _cacheEmbeddings(embeddings) {
|
||
// TODO: Save to ai_embeddings_cache table
|
||
}
|
||
}
|
||
|
||
module.exports = { CategoryEmbeddings };
|
||
```
|
||
|
||
#### Vector Store
|
||
|
||
```javascript
|
||
// services/ai/embeddings/vectorStore.js
|
||
|
||
const { cosineSimilarity } = require('./similarity');
|
||
|
||
/**
|
||
* In-memory vector store for fast similarity search
|
||
*/
|
||
class VectorStore {
|
||
constructor() {
|
||
this.vectors = []; // Array of { id, embedding }
|
||
}
|
||
|
||
/**
|
||
* Load vectors into the store
|
||
*/
|
||
load(vectors) {
|
||
this.vectors = vectors;
|
||
}
|
||
|
||
/**
|
||
* Add a single vector
|
||
*/
|
||
add(id, embedding) {
|
||
this.vectors.push({ id, embedding });
|
||
}
|
||
|
||
/**
|
||
* Search for most similar vectors
|
||
*/
|
||
search(queryEmbedding, topK = 10) {
|
||
const scored = this.vectors.map(item => ({
|
||
id: item.id,
|
||
similarity: cosineSimilarity(queryEmbedding, item.embedding)
|
||
}));
|
||
|
||
// Sort by similarity descending
|
||
scored.sort((a, b) => b.similarity - a.similarity);
|
||
|
||
return scored.slice(0, topK);
|
||
}
|
||
|
||
/**
|
||
* Get store size
|
||
*/
|
||
size() {
|
||
return this.vectors.length;
|
||
}
|
||
|
||
/**
|
||
* Clear the store
|
||
*/
|
||
clear() {
|
||
this.vectors = [];
|
||
}
|
||
}
|
||
|
||
module.exports = { VectorStore };
|
||
```
|
||
|
||
#### Similarity Utilities
|
||
|
||
```javascript
|
||
// services/ai/embeddings/similarity.js
|
||
|
||
/**
|
||
* Compute cosine similarity between two vectors
|
||
*/
|
||
function cosineSimilarity(a, b) {
|
||
if (a.length !== b.length) {
|
||
throw new Error('Vectors must have same length');
|
||
}
|
||
|
||
let dotProduct = 0;
|
||
let normA = 0;
|
||
let normB = 0;
|
||
|
||
for (let i = 0; i < a.length; i++) {
|
||
dotProduct += a[i] * b[i];
|
||
normA += a[i] * a[i];
|
||
normB += b[i] * b[i];
|
||
}
|
||
|
||
const denominator = Math.sqrt(normA) * Math.sqrt(normB);
|
||
if (denominator === 0) return 0;
|
||
|
||
return dotProduct / denominator;
|
||
}
|
||
|
||
/**
|
||
* Compute Euclidean distance between two vectors
|
||
*/
|
||
function euclideanDistance(a, b) {
|
||
if (a.length !== b.length) {
|
||
throw new Error('Vectors must have same length');
|
||
}
|
||
|
||
let sum = 0;
|
||
for (let i = 0; i < a.length; i++) {
|
||
const diff = a[i] - b[i];
|
||
sum += diff * diff;
|
||
}
|
||
|
||
return Math.sqrt(sum);
|
||
}
|
||
|
||
module.exports = { cosineSimilarity, euclideanDistance };
|
||
```
|
||
|
||
### Micro-Prompt Tasks
|
||
|
||
#### Name Suggestion Task
|
||
|
||
```javascript
|
||
// services/ai/tasks/nameSuggestionTask.js
|
||
|
||
const { buildNamePrompt } = require('../prompts/namePrompts');
|
||
|
||
function createNameSuggestionTask({ provider, logger, config }) {
|
||
const taskConfig = config.tasks?.nameSuggestion || {};
|
||
|
||
async function run({ product }) {
|
||
if (!product?.name && !product?.description) {
|
||
return { suggestion: null, reason: 'No name or description provided' };
|
||
}
|
||
|
||
const prompt = buildNamePrompt(product);
|
||
|
||
const response = await provider.chatCompletion({
|
||
messages: [{ role: 'user', content: prompt }],
|
||
model: taskConfig.model || 'llama-3.3-70b-versatile',
|
||
temperature: taskConfig.temperature || 0.2,
|
||
maxTokens: taskConfig.maxTokens || 150
|
||
});
|
||
|
||
const suggestion = response.content.trim();
|
||
|
||
// Only return if different from original
|
||
if (suggestion === product.name) {
|
||
return { suggestion: null, unchanged: true };
|
||
}
|
||
|
||
return {
|
||
suggestion,
|
||
original: product.name,
|
||
usage: response.usage,
|
||
latencyMs: response.latencyMs
|
||
};
|
||
}
|
||
|
||
return {
|
||
id: 'suggest.name',
|
||
description: 'Suggest formatted product name',
|
||
run
|
||
};
|
||
}
|
||
|
||
module.exports = { createNameSuggestionTask };
|
||
```
|
||
|
||
#### Category Suggestion Task
|
||
|
||
```javascript
|
||
// services/ai/tasks/categorySuggestionTask.js
|
||
|
||
const { suggestCategories } = require('../embeddings');
|
||
const { buildCategoryPrompt } = require('../prompts/categoryPrompts');
|
||
|
||
function createCategorySuggestionTask({ provider, logger, config }) {
|
||
const taskConfig = config.tasks?.categorySuggestion || {};
|
||
|
||
async function run({ product }) {
|
||
const productText = `${product.name || ''} ${product.description || ''}`.trim();
|
||
|
||
if (!productText) {
|
||
return { suggestions: [], reason: 'No product text provided' };
|
||
}
|
||
|
||
// Step 1: Get top candidates via embedding similarity
|
||
const embeddingMatches = await suggestCategories(productText, 10);
|
||
|
||
// Step 2: Use AI to pick best matches from candidates
|
||
const prompt = buildCategoryPrompt(product, embeddingMatches);
|
||
|
||
const response = await provider.chatCompletion({
|
||
messages: [{ role: 'user', content: prompt }],
|
||
model: taskConfig.model || 'llama-3.1-8b-instant',
|
||
temperature: taskConfig.temperature || 0.1,
|
||
maxTokens: taskConfig.maxTokens || 100,
|
||
responseFormat: { type: 'json_object' }
|
||
});
|
||
|
||
let selectedIds = [];
|
||
try {
|
||
const parsed = JSON.parse(response.content);
|
||
selectedIds = Array.isArray(parsed.categories) ? parsed.categories : parsed;
|
||
} catch {
|
||
logger?.warn('[CategorySuggestion] Failed to parse response', { content: response.content });
|
||
}
|
||
|
||
// Filter matches to only selected IDs
|
||
const suggestions = embeddingMatches
|
||
.filter(m => selectedIds.includes(m.id))
|
||
.map(m => ({
|
||
id: m.id,
|
||
name: m.name,
|
||
fullPath: m.fullPath,
|
||
similarity: m.similarity
|
||
}));
|
||
|
||
return {
|
||
suggestions,
|
||
allMatches: embeddingMatches, // Include all for fallback
|
||
usage: response.usage,
|
||
latencyMs: response.latencyMs
|
||
};
|
||
}
|
||
|
||
return {
|
||
id: 'suggest.categories',
|
||
description: 'Suggest product categories',
|
||
run
|
||
};
|
||
}
|
||
|
||
module.exports = { createCategorySuggestionTask };
|
||
```
|
||
|
||
#### Description Enhancement Task
|
||
|
||
```javascript
|
||
// services/ai/tasks/descriptionEnhanceTask.js
|
||
|
||
const { buildDescriptionPrompt } = require('../prompts/descriptionPrompts');
|
||
|
||
function createDescriptionEnhanceTask({ provider, logger, config }) {
|
||
const taskConfig = config.tasks?.descriptionEnhance || {};
|
||
|
||
/**
|
||
* Enhance descriptions for a batch of products
|
||
*/
|
||
async function run({ products, mode = 'enhance' }) {
|
||
if (!Array.isArray(products) || products.length === 0) {
|
||
return { results: [], reason: 'No products provided' };
|
||
}
|
||
|
||
// Process in smaller batches for reliability
|
||
const batchSize = taskConfig.batchSize || 10;
|
||
const results = [];
|
||
|
||
for (let i = 0; i < products.length; i += batchSize) {
|
||
const batch = products.slice(i, i + batchSize);
|
||
const batchResults = await processBatch(batch, mode);
|
||
results.push(...batchResults);
|
||
}
|
||
|
||
return { results };
|
||
}
|
||
|
||
async function processBatch(products, mode) {
|
||
const prompt = buildDescriptionPrompt(products, mode);
|
||
|
||
const response = await provider.chatCompletion({
|
||
messages: [
|
||
{ role: 'system', content: getSystemPrompt() },
|
||
{ role: 'user', content: prompt }
|
||
],
|
||
model: taskConfig.model || 'claude-3-5-haiku-20241022',
|
||
temperature: taskConfig.temperature || 0.7,
|
||
maxTokens: taskConfig.maxTokens || 2000,
|
||
responseFormat: { type: 'json_object' }
|
||
});
|
||
|
||
let parsed = [];
|
||
try {
|
||
const result = JSON.parse(response.content);
|
||
parsed = result.descriptions || result;
|
||
} catch (error) {
|
||
logger?.warn('[DescriptionEnhance] Failed to parse response', { error: error.message });
|
||
}
|
||
|
||
// Match results back to products
|
||
return products.map((product, index) => {
|
||
const enhanced = parsed[index] || {};
|
||
return {
|
||
productId: product._index || product.upc || index,
|
||
original: product.description,
|
||
enhanced: enhanced.description || null,
|
||
changed: enhanced.description && enhanced.description !== product.description
|
||
};
|
||
});
|
||
}
|
||
|
||
function getSystemPrompt() {
|
||
return `You are a product copywriter for a craft supplies ecommerce store.
|
||
Write SEO-friendly, accurate descriptions that help customers understand what they're buying.
|
||
Always state what's included. Never use "our" - use "this" or the company name.
|
||
Keep descriptions to 2-4 sentences unless the product is complex.`;
|
||
}
|
||
|
||
return {
|
||
id: 'enhance.descriptions',
|
||
description: 'Enhance product descriptions in batch',
|
||
run
|
||
};
|
||
}
|
||
|
||
module.exports = { createDescriptionEnhanceTask };
|
||
```
|
||
|
||
### Prompts
|
||
|
||
#### Name Prompts
|
||
|
||
```javascript
|
||
// services/ai/prompts/namePrompts.js
|
||
|
||
function buildNamePrompt(product) {
|
||
return `Format this product name for a craft supplies store.
|
||
|
||
CURRENT NAME: "${product.name || ''}"
|
||
COMPANY: ${product.company_name || product.company || 'Unknown'}
|
||
LINE: ${product.line_name || product.line || 'None'}
|
||
PRODUCT TYPE: ${inferProductType(product)}
|
||
|
||
NAMING RULES:
|
||
- Single product in line: [Line Name] [Product Name] - [Company]
|
||
- Multiple similar products: [Differentiator] [Product Type] - [Line Name] - [Company]
|
||
- Standalone products: [Product Name] - [Company]
|
||
- Always capitalize every word (including "The", "And", etc.)
|
||
- Paper sizes: Use "12x12", "6x6" (no spaces or units)
|
||
- All stamps → "Stamp Set" (not "Clear Stamps")
|
||
- All dies → "Dies" (not "Die Set")
|
||
|
||
SPECIAL RULES:
|
||
- Tim Holtz from Ranger: "[Color] [Product] - Tim Holtz Distress - Ranger"
|
||
- Tim Holtz from Sizzix: "[Product] by Tim Holtz - Sizzix"
|
||
- Dylusions from Ranger: "[Product] - Dylusions - Ranger"
|
||
|
||
Return ONLY the corrected name, nothing else.`;
|
||
}
|
||
|
||
function inferProductType(product) {
|
||
const name = (product.name || '').toLowerCase();
|
||
const desc = (product.description || '').toLowerCase();
|
||
const text = `${name} ${desc}`;
|
||
|
||
if (text.includes('stamp')) return 'Stamps';
|
||
if (text.includes('die') || text.includes('thinlit')) return 'Dies';
|
||
if (text.includes('paper') || text.includes('cardstock')) return 'Paper';
|
||
if (text.includes('ink')) return 'Ink';
|
||
if (text.includes('sticker')) return 'Stickers';
|
||
if (text.includes('washi')) return 'Washi Tape';
|
||
return 'Unknown';
|
||
}
|
||
|
||
module.exports = { buildNamePrompt };
|
||
```
|
||
|
||
#### Category Prompts
|
||
|
||
```javascript
|
||
// services/ai/prompts/categoryPrompts.js
|
||
|
||
function buildCategoryPrompt(product, categoryMatches) {
|
||
const matchList = categoryMatches
|
||
.map(c => `${c.id}: ${c.fullPath}`)
|
||
.join('\n');
|
||
|
||
return `Select the best categories for this craft product.
|
||
|
||
PRODUCT:
|
||
Name: "${product.name || ''}"
|
||
Description: "${(product.description || '').substring(0, 200)}"
|
||
Company: ${product.company_name || 'Unknown'}
|
||
|
||
CATEGORY OPTIONS:
|
||
${matchList}
|
||
|
||
RULES:
|
||
- Select 1-3 most specific categories
|
||
- Prefer deeper subcategories over parents
|
||
- If selecting a subcategory, don't also select its parent
|
||
- Never select "Deals" or "Black Friday" categories
|
||
|
||
Return JSON: {"categories": [id1, id2]}`;
|
||
}
|
||
|
||
module.exports = { buildCategoryPrompt };
|
||
```
|
||
|
||
#### Description Prompts
|
||
|
||
```javascript
|
||
// services/ai/prompts/descriptionPrompts.js
|
||
|
||
function buildDescriptionPrompt(products, mode = 'enhance') {
|
||
const productList = products.map((p, i) => {
|
||
return `[${i}]
|
||
Name: ${p.name || 'Unknown'}
|
||
Company: ${p.company_name || 'Unknown'}
|
||
Current Description: ${p.description || '(none)'}
|
||
Categories: ${p.category_names?.join(', ') || 'Unknown'}
|
||
Dimensions: ${p.length || '?'}x${p.width || '?'} inches
|
||
Weight: ${p.weight || '?'} oz`;
|
||
}).join('\n\n');
|
||
|
||
const instruction = mode === 'generate'
|
||
? 'Write new descriptions for these products.'
|
||
: 'Improve these product descriptions. Fix grammar, add missing details, make SEO-friendly.';
|
||
|
||
return `${instruction}
|
||
|
||
PRODUCTS:
|
||
${productList}
|
||
|
||
RULES:
|
||
- 2-4 sentences each, professional but friendly
|
||
- Always state what's included (quantity, size)
|
||
- Don't use "our" - use "this" or company name
|
||
- Don't add generic filler ("perfect for all your crafts")
|
||
- State facts: dimensions, compatibility, materials
|
||
- Don't make up information you're not sure about
|
||
|
||
Return JSON: {"descriptions": [{"description": "..."}, ...]}`;
|
||
}
|
||
|
||
module.exports = { buildDescriptionPrompt };
|
||
```
|
||
|
||
### Normalizers (Tier 1 - No AI)
|
||
|
||
```javascript
|
||
// services/ai/normalizers/index.js
|
||
|
||
const priceNormalizer = require('./priceNormalizer');
|
||
const upcNormalizer = require('./upcNormalizer');
|
||
const countryCodeNormalizer = require('./countryCodeNormalizer');
|
||
const dateNormalizer = require('./dateNormalizer');
|
||
const numericNormalizer = require('./numericNormalizer');
|
||
|
||
/**
|
||
* Apply all relevant normalizers to a product
|
||
*/
|
||
function normalizeProduct(product, fieldMappings) {
|
||
const normalized = { ...product };
|
||
const changes = [];
|
||
|
||
// Price fields
|
||
for (const field of ['msrp', 'cost_each', 'price']) {
|
||
if (normalized[field] !== undefined) {
|
||
const result = priceNormalizer.normalize(normalized[field]);
|
||
if (result.value !== normalized[field]) {
|
||
changes.push({ field, from: normalized[field], to: result.value });
|
||
normalized[field] = result.value;
|
||
}
|
||
}
|
||
}
|
||
|
||
// UPC/SKU fields
|
||
for (const field of ['upc', 'supplier_no', 'notions_no', 'item_number']) {
|
||
if (normalized[field] !== undefined) {
|
||
const result = upcNormalizer.normalize(normalized[field]);
|
||
if (result.value !== normalized[field]) {
|
||
changes.push({ field, from: normalized[field], to: result.value });
|
||
normalized[field] = result.value;
|
||
}
|
||
}
|
||
}
|
||
|
||
// Country of origin
|
||
if (normalized.coo !== undefined) {
|
||
const result = countryCodeNormalizer.normalize(normalized.coo);
|
||
if (result.value !== normalized.coo) {
|
||
changes.push({ field: 'coo', from: normalized.coo, to: result.value });
|
||
normalized.coo = result.value;
|
||
}
|
||
}
|
||
|
||
// ETA date
|
||
if (normalized.eta !== undefined) {
|
||
const result = dateNormalizer.normalizeEta(normalized.eta);
|
||
if (result.value !== normalized.eta) {
|
||
changes.push({ field: 'eta', from: normalized.eta, to: result.value });
|
||
normalized.eta = result.value;
|
||
}
|
||
}
|
||
|
||
// Numeric fields
|
||
for (const field of ['qty_per_unit', 'case_qty']) {
|
||
if (normalized[field] !== undefined) {
|
||
const result = numericNormalizer.normalize(normalized[field]);
|
||
if (result.value !== normalized[field]) {
|
||
changes.push({ field, from: normalized[field], to: result.value });
|
||
normalized[field] = result.value;
|
||
}
|
||
}
|
||
}
|
||
|
||
return { normalized, changes };
|
||
}
|
||
|
||
module.exports = {
|
||
normalizeProduct,
|
||
priceNormalizer,
|
||
upcNormalizer,
|
||
countryCodeNormalizer,
|
||
dateNormalizer,
|
||
numericNormalizer
|
||
};
|
||
```
|
||
|
||
```javascript
|
||
// services/ai/normalizers/priceNormalizer.js
|
||
|
||
/**
|
||
* Normalize price values
|
||
* Input: "$5.00", "5", "5.5", "$1,234.56"
|
||
* Output: "5.00", "5.00", "5.50", "1234.56"
|
||
*/
|
||
function normalize(value) {
|
||
if (value === null || value === undefined || value === '') {
|
||
return { value, changed: false };
|
||
}
|
||
|
||
const original = String(value);
|
||
|
||
// Remove currency symbols and commas
|
||
let cleaned = original.replace(/[$,]/g, '').trim();
|
||
|
||
// Try to parse as number
|
||
const num = parseFloat(cleaned);
|
||
if (isNaN(num)) {
|
||
return { value: original, changed: false, error: 'Not a valid number' };
|
||
}
|
||
|
||
// Format to 2 decimal places
|
||
const formatted = num.toFixed(2);
|
||
|
||
return {
|
||
value: formatted,
|
||
changed: formatted !== original
|
||
};
|
||
}
|
||
|
||
module.exports = { normalize };
|
||
```
|
||
|
||
```javascript
|
||
// services/ai/normalizers/countryCodeNormalizer.js
|
||
|
||
const COUNTRY_MAP = {
|
||
// Full names
|
||
'united states': 'US',
|
||
'united states of america': 'US',
|
||
'china': 'CN',
|
||
'peoples republic of china': 'CN',
|
||
"people's republic of china": 'CN',
|
||
'taiwan': 'TW',
|
||
'japan': 'JP',
|
||
'south korea': 'KR',
|
||
'korea': 'KR',
|
||
'india': 'IN',
|
||
'germany': 'DE',
|
||
'united kingdom': 'GB',
|
||
'great britain': 'GB',
|
||
'france': 'FR',
|
||
'italy': 'IT',
|
||
'spain': 'ES',
|
||
'canada': 'CA',
|
||
'mexico': 'MX',
|
||
'brazil': 'BR',
|
||
'australia': 'AU',
|
||
'vietnam': 'VN',
|
||
'thailand': 'TH',
|
||
'indonesia': 'ID',
|
||
'philippines': 'PH',
|
||
'malaysia': 'MY',
|
||
|
||
// Common abbreviations
|
||
'usa': 'US',
|
||
'u.s.a.': 'US',
|
||
'u.s.': 'US',
|
||
'prc': 'CN',
|
||
'uk': 'GB',
|
||
'u.k.': 'GB',
|
||
'rok': 'KR'
|
||
};
|
||
|
||
function normalize(value) {
|
||
if (value === null || value === undefined || value === '') {
|
||
return { value, changed: false };
|
||
}
|
||
|
||
const original = String(value).trim();
|
||
const lookup = original.toLowerCase();
|
||
|
||
// Check if it's already a valid 2-letter code
|
||
if (/^[A-Z]{2}$/.test(original)) {
|
||
return { value: original, changed: false };
|
||
}
|
||
|
||
// Look up in map
|
||
if (COUNTRY_MAP[lookup]) {
|
||
return {
|
||
value: COUNTRY_MAP[lookup],
|
||
changed: true
|
||
};
|
||
}
|
||
|
||
// If 2 chars, uppercase and return
|
||
if (original.length === 2) {
|
||
const upper = original.toUpperCase();
|
||
return {
|
||
value: upper,
|
||
changed: upper !== original
|
||
};
|
||
}
|
||
|
||
// Take first 2 chars as fallback
|
||
const fallback = original.substring(0, 2).toUpperCase();
|
||
return {
|
||
value: fallback,
|
||
changed: true,
|
||
warning: `Unknown country "${original}", using "${fallback}"`
|
||
};
|
||
}
|
||
|
||
module.exports = { normalize };
|
||
```
|
||
|
||
### Main AI Service Entry
|
||
|
||
```javascript
|
||
// services/ai/index.js
|
||
|
||
const { AiTaskRegistry, TASK_IDS } = require('./taskRegistry');
|
||
const { AiWorkQueue } = require('./workQueue');
|
||
const { createProvider } = require('./providers');
|
||
const { initializeEmbeddings } = require('./embeddings');
|
||
const { normalizeProduct } = require('./normalizers');
|
||
|
||
// Task factories
|
||
const { createNameSuggestionTask } = require('./tasks/nameSuggestionTask');
|
||
const { createCategorySuggestionTask } = require('./tasks/categorySuggestionTask');
|
||
const { createThemeSuggestionTask } = require('./tasks/themeSuggestionTask');
|
||
const { createColorSuggestionTask } = require('./tasks/colorSuggestionTask');
|
||
const { createDescriptionEnhanceTask } = require('./tasks/descriptionEnhanceTask');
|
||
const { createConsistencyCheckTask } = require('./tasks/consistencyCheckTask');
|
||
|
||
let initialized = false;
|
||
let aiEnabled = false;
|
||
let registry = null;
|
||
let workQueue = null;
|
||
let providers = {};
|
||
|
||
/**
|
||
* Initialize the AI system
|
||
*/
|
||
async function initialize({ config, mysqlConnection, logger }) {
|
||
if (initialized) {
|
||
return { enabled: aiEnabled };
|
||
}
|
||
|
||
if (!config?.ai?.enabled) {
|
||
logger?.info('[AI] AI features disabled by configuration');
|
||
initialized = true;
|
||
aiEnabled = false;
|
||
return { enabled: false };
|
||
}
|
||
|
||
try {
|
||
// Initialize providers
|
||
providers.groq = createProvider('groq', config.ai);
|
||
providers.openai = createProvider('openai', config.ai);
|
||
|
||
if (config.ai.providers?.anthropic?.apiKey) {
|
||
providers.anthropic = createProvider('anthropic', config.ai);
|
||
}
|
||
|
||
// Initialize embeddings (requires OpenAI for embedding generation)
|
||
await initializeEmbeddings({
|
||
openaiProvider: providers.openai,
|
||
mysqlConnection,
|
||
logger
|
||
});
|
||
|
||
// Initialize work queue
|
||
workQueue = new AiWorkQueue(config.ai.maxConcurrentTasks || 5);
|
||
|
||
// Initialize task registry
|
||
registry = new AiTaskRegistry();
|
||
|
||
// Register Tier 2 tasks (real-time, Groq)
|
||
registry.register(createNameSuggestionTask({
|
||
provider: providers.groq,
|
||
logger,
|
||
config: config.ai
|
||
}));
|
||
|
||
registry.register(createCategorySuggestionTask({
|
||
provider: providers.groq,
|
||
logger,
|
||
config: config.ai
|
||
}));
|
||
|
||
registry.register(createThemeSuggestionTask({
|
||
provider: providers.groq,
|
||
logger,
|
||
config: config.ai
|
||
}));
|
||
|
||
registry.register(createColorSuggestionTask({
|
||
provider: providers.groq,
|
||
logger,
|
||
config: config.ai
|
||
}));
|
||
|
||
// Register Tier 3 tasks (batch, Anthropic/OpenAI)
|
||
const batchProvider = providers.anthropic || providers.openai;
|
||
|
||
registry.register(createDescriptionEnhanceTask({
|
||
provider: batchProvider,
|
||
logger,
|
||
config: config.ai
|
||
}));
|
||
|
||
registry.register(createConsistencyCheckTask({
|
||
provider: batchProvider,
|
||
logger,
|
||
config: config.ai
|
||
}));
|
||
|
||
initialized = true;
|
||
aiEnabled = true;
|
||
logger?.info('[AI] AI system initialized successfully');
|
||
|
||
return { enabled: true };
|
||
} catch (error) {
|
||
logger?.error('[AI] Failed to initialize AI system', { error: error.message });
|
||
initialized = true;
|
||
aiEnabled = false;
|
||
return { enabled: false, error: error.message };
|
||
}
|
||
}
|
||
|
||
/**
|
||
* Run a task by ID
|
||
*/
|
||
async function runTask(taskId, payload = {}) {
|
||
if (!aiEnabled) {
|
||
throw new Error('AI features are disabled');
|
||
}
|
||
|
||
const task = registry?.get(taskId);
|
||
if (!task) {
|
||
throw new Error(`Unknown task: ${taskId}`);
|
||
}
|
||
|
||
// Execute through work queue for concurrency control
|
||
return workQueue.enqueue(() => task.run(payload));
|
||
}
|
||
|
||
/**
|
||
* Normalize a product using Tier 1 (code-based) rules
|
||
*/
|
||
function normalize(product, fieldMappings) {
|
||
return normalizeProduct(product, fieldMappings);
|
||
}
|
||
|
||
/**
|
||
* Get system status
|
||
*/
|
||
function getStatus() {
|
||
return {
|
||
enabled: aiEnabled,
|
||
initialized,
|
||
tasks: registry?.list() || [],
|
||
queue: workQueue?.getStats() || null
|
||
};
|
||
}
|
||
|
||
module.exports = {
|
||
initialize,
|
||
runTask,
|
||
normalize,
|
||
getStatus,
|
||
TASK_IDS
|
||
};
|
||
```
|
||
|
||
### API Routes
|
||
|
||
```javascript
|
||
// routes/ai.js
|
||
|
||
const express = require('express');
|
||
const router = express.Router();
|
||
const ai = require('../services/ai');
|
||
const { TASK_IDS } = require('../services/ai/taskRegistry');
|
||
|
||
/**
|
||
* Get AI system status
|
||
*/
|
||
router.get('/status', (req, res) => {
|
||
res.json(ai.getStatus());
|
||
});
|
||
|
||
/**
|
||
* Normalize a product (Tier 1 - no AI)
|
||
*/
|
||
router.post('/normalize', (req, res) => {
|
||
try {
|
||
const { product, fieldMappings } = req.body;
|
||
const result = ai.normalize(product, fieldMappings);
|
||
res.json(result);
|
||
} catch (error) {
|
||
res.status(500).json({ error: error.message });
|
||
}
|
||
});
|
||
|
||
/**
|
||
* Normalize multiple products (Tier 1 - no AI)
|
||
*/
|
||
router.post('/normalize/batch', (req, res) => {
|
||
try {
|
||
const { products, fieldMappings } = req.body;
|
||
const results = products.map(product => ai.normalize(product, fieldMappings));
|
||
res.json({ results });
|
||
} catch (error) {
|
||
res.status(500).json({ error: error.message });
|
||
}
|
||
});
|
||
|
||
/**
|
||
* Get name suggestion (Tier 2 - Groq)
|
||
*/
|
||
router.post('/suggest/name', async (req, res) => {
|
||
try {
|
||
const { product } = req.body;
|
||
const result = await ai.runTask(TASK_IDS.SUGGEST_NAME, { product });
|
||
res.json(result);
|
||
} catch (error) {
|
||
console.error('[AI] Name suggestion error:', error);
|
||
res.status(500).json({ error: error.message });
|
||
}
|
||
});
|
||
|
||
/**
|
||
* Get category suggestions (Tier 2 - Embeddings + Groq)
|
||
*/
|
||
router.post('/suggest/categories', async (req, res) => {
|
||
try {
|
||
const { product } = req.body;
|
||
const result = await ai.runTask(TASK_IDS.SUGGEST_CATEGORIES, { product });
|
||
res.json(result);
|
||
} catch (error) {
|
||
console.error('[AI] Category suggestion error:', error);
|
||
res.status(500).json({ error: error.message });
|
||
}
|
||
});
|
||
|
||
/**
|
||
* Get theme suggestions (Tier 2 - Embeddings + Groq)
|
||
*/
|
||
router.post('/suggest/themes', async (req, res) => {
|
||
try {
|
||
const { product } = req.body;
|
||
const result = await ai.runTask(TASK_IDS.SUGGEST_THEMES, { product });
|
||
res.json(result);
|
||
} catch (error) {
|
||
console.error('[AI] Theme suggestion error:', error);
|
||
res.status(500).json({ error: error.message });
|
||
}
|
||
});
|
||
|
||
/**
|
||
* Get color suggestions (Tier 2 - Groq)
|
||
*/
|
||
router.post('/suggest/colors', async (req, res) => {
|
||
try {
|
||
const { product } = req.body;
|
||
const result = await ai.runTask(TASK_IDS.SUGGEST_COLORS, { product });
|
||
res.json(result);
|
||
} catch (error) {
|
||
console.error('[AI] Color suggestion error:', error);
|
||
res.status(500).json({ error: error.message });
|
||
}
|
||
});
|
||
|
||
/**
|
||
* Enhance descriptions (Tier 3 - Batch)
|
||
*/
|
||
router.post('/enhance/descriptions', async (req, res) => {
|
||
try {
|
||
const { products, mode = 'enhance' } = req.body;
|
||
const result = await ai.runTask(TASK_IDS.ENHANCE_DESCRIPTIONS, { products, mode });
|
||
res.json(result);
|
||
} catch (error) {
|
||
console.error('[AI] Description enhancement error:', error);
|
||
res.status(500).json({ error: error.message });
|
||
}
|
||
});
|
||
|
||
/**
|
||
* Check consistency across products (Tier 3 - Batch)
|
||
*/
|
||
router.post('/check/consistency', async (req, res) => {
|
||
try {
|
||
const { products } = req.body;
|
||
const result = await ai.runTask(TASK_IDS.CHECK_CONSISTENCY, { products });
|
||
res.json(result);
|
||
} catch (error) {
|
||
console.error('[AI] Consistency check error:', error);
|
||
res.status(500).json({ error: error.message });
|
||
}
|
||
});
|
||
|
||
/**
|
||
* Legacy endpoint - full validation (redirects to new system)
|
||
* Kept for backwards compatibility during migration
|
||
*/
|
||
router.post('/validate', async (req, res) => {
|
||
// TODO: Implement migration path
|
||
// For now, combine Tier 1 normalization with Tier 3 batch processing
|
||
res.status(501).json({
|
||
error: 'Legacy validation endpoint deprecated. Use /normalize, /suggest/*, and /enhance/* endpoints.'
|
||
});
|
||
});
|
||
|
||
module.exports = router;
|
||
```
|
||
|
||
---
|
||
|
||
## Frontend Implementation
|
||
|
||
### Suggestion Hooks
|
||
|
||
#### Base Hook
|
||
|
||
```typescript
|
||
// hooks/ai/useAiSuggestion.ts
|
||
|
||
import { useState, useCallback } from 'react';
|
||
import { useDebouncedCallback } from 'use-debounce';
|
||
|
||
interface SuggestionResult<T> {
|
||
suggestion: T | null;
|
||
isLoading: boolean;
|
||
error: string | null;
|
||
latencyMs: number | null;
|
||
}
|
||
|
||
interface UseAiSuggestionOptions {
|
||
debounceMs?: number;
|
||
enabled?: boolean;
|
||
}
|
||
|
||
export function useAiSuggestion<T>(
|
||
endpoint: string,
|
||
options: UseAiSuggestionOptions = {}
|
||
) {
|
||
const { debounceMs = 300, enabled = true } = options;
|
||
|
||
const [result, setResult] = useState<SuggestionResult<T>>({
|
||
suggestion: null,
|
||
isLoading: false,
|
||
error: null,
|
||
latencyMs: null
|
||
});
|
||
|
||
const fetchSuggestion = useCallback(async (payload: unknown) => {
|
||
if (!enabled) return;
|
||
|
||
setResult(prev => ({ ...prev, isLoading: true, error: null }));
|
||
|
||
try {
|
||
const response = await fetch(`/api/ai${endpoint}`, {
|
||
method: 'POST',
|
||
headers: { 'Content-Type': 'application/json' },
|
||
body: JSON.stringify(payload)
|
||
});
|
||
|
||
if (!response.ok) {
|
||
throw new Error(`API error: ${response.status}`);
|
||
}
|
||
|
||
const data = await response.json();
|
||
|
||
setResult({
|
||
suggestion: data.suggestion ?? data.suggestions ?? data,
|
||
isLoading: false,
|
||
error: null,
|
||
latencyMs: data.latencyMs ?? null
|
||
});
|
||
} catch (error) {
|
||
setResult(prev => ({
|
||
...prev,
|
||
isLoading: false,
|
||
error: error instanceof Error ? error.message : 'Unknown error'
|
||
}));
|
||
}
|
||
}, [endpoint, enabled]);
|
||
|
||
const debouncedFetch = useDebouncedCallback(fetchSuggestion, debounceMs);
|
||
|
||
const clear = useCallback(() => {
|
||
setResult({
|
||
suggestion: null,
|
||
isLoading: false,
|
||
error: null,
|
||
latencyMs: null
|
||
});
|
||
}, []);
|
||
|
||
return {
|
||
...result,
|
||
fetch: fetchSuggestion,
|
||
fetchDebounced: debouncedFetch,
|
||
clear
|
||
};
|
||
}
|
||
```
|
||
|
||
#### Name Suggestion Hook
|
||
|
||
```typescript
|
||
// hooks/ai/useNameSuggestion.ts
|
||
|
||
import { useAiSuggestion } from './useAiSuggestion';
|
||
import { ProductRow } from '@/types/import';
|
||
|
||
interface NameSuggestionResult {
|
||
suggestion: string | null;
|
||
original: string;
|
||
unchanged?: boolean;
|
||
}
|
||
|
||
export function useNameSuggestion() {
|
||
const {
|
||
suggestion,
|
||
isLoading,
|
||
error,
|
||
fetch,
|
||
fetchDebounced,
|
||
clear
|
||
} = useAiSuggestion<NameSuggestionResult>('/suggest/name');
|
||
|
||
const suggest = (product: Partial<ProductRow>) => {
|
||
if (!product.name && !product.description) return;
|
||
fetchDebounced({ product });
|
||
};
|
||
|
||
const suggestImmediate = (product: Partial<ProductRow>) => {
|
||
if (!product.name && !product.description) return;
|
||
fetch({ product });
|
||
};
|
||
|
||
return {
|
||
suggestion: suggestion?.suggestion ?? null,
|
||
original: suggestion?.original ?? null,
|
||
unchanged: suggestion?.unchanged ?? false,
|
||
isLoading,
|
||
error,
|
||
suggest,
|
||
suggestImmediate,
|
||
clear
|
||
};
|
||
}
|
||
```
|
||
|
||
#### Category Suggestion Hook
|
||
|
||
```typescript
|
||
// hooks/ai/useCategorySuggestion.ts
|
||
|
||
import { useAiSuggestion } from './useAiSuggestion';
|
||
import { ProductRow } from '@/types/import';
|
||
|
||
interface CategoryMatch {
|
||
id: number;
|
||
name: string;
|
||
fullPath: string;
|
||
similarity: number;
|
||
}
|
||
|
||
interface CategorySuggestionResult {
|
||
suggestions: CategoryMatch[];
|
||
allMatches: CategoryMatch[];
|
||
}
|
||
|
||
export function useCategorySuggestion() {
|
||
const {
|
||
suggestion,
|
||
isLoading,
|
||
error,
|
||
fetch,
|
||
clear
|
||
} = useAiSuggestion<CategorySuggestionResult>('/suggest/categories', {
|
||
debounceMs: 500 // Longer debounce for embedding lookup
|
||
});
|
||
|
||
const suggest = (product: Partial<ProductRow>) => {
|
||
const hasText = product.name || product.description;
|
||
if (!hasText) return;
|
||
fetch({ product });
|
||
};
|
||
|
||
return {
|
||
suggestions: suggestion?.suggestions ?? [],
|
||
allMatches: suggestion?.allMatches ?? [],
|
||
isLoading,
|
||
error,
|
||
suggest,
|
||
clear
|
||
};
|
||
}
|
||
```
|
||
|
||
#### Batch Enhancement Hook
|
||
|
||
```typescript
|
||
// hooks/ai/useDescriptionEnhancement.ts
|
||
|
||
import { useState, useCallback } from 'react';
|
||
import { ProductRow } from '@/types/import';
|
||
|
||
interface EnhancementResult {
|
||
productId: string | number;
|
||
original: string | null;
|
||
enhanced: string | null;
|
||
changed: boolean;
|
||
}
|
||
|
||
interface UseDescriptionEnhancementResult {
|
||
results: EnhancementResult[];
|
||
isProcessing: boolean;
|
||
progress: { current: number; total: number };
|
||
error: string | null;
|
||
enhance: (products: ProductRow[], mode?: 'enhance' | 'generate') => Promise<void>;
|
||
cancel: () => void;
|
||
}
|
||
|
||
export function useDescriptionEnhancement(): UseDescriptionEnhancementResult {
|
||
const [results, setResults] = useState<EnhancementResult[]>([]);
|
||
const [isProcessing, setIsProcessing] = useState(false);
|
||
const [progress, setProgress] = useState({ current: 0, total: 0 });
|
||
const [error, setError] = useState<string | null>(null);
|
||
const [abortController, setAbortController] = useState<AbortController | null>(null);
|
||
|
||
const enhance = useCallback(async (
|
||
products: ProductRow[],
|
||
mode: 'enhance' | 'generate' = 'enhance'
|
||
) => {
|
||
const controller = new AbortController();
|
||
setAbortController(controller);
|
||
setIsProcessing(true);
|
||
setProgress({ current: 0, total: products.length });
|
||
setResults([]);
|
||
setError(null);
|
||
|
||
try {
|
||
const response = await fetch('/api/ai/enhance/descriptions', {
|
||
method: 'POST',
|
||
headers: { 'Content-Type': 'application/json' },
|
||
body: JSON.stringify({ products, mode }),
|
||
signal: controller.signal
|
||
});
|
||
|
||
if (!response.ok) {
|
||
throw new Error(`API error: ${response.status}`);
|
||
}
|
||
|
||
const data = await response.json();
|
||
setResults(data.results || []);
|
||
setProgress({ current: products.length, total: products.length });
|
||
} catch (err) {
|
||
if (err instanceof Error && err.name === 'AbortError') {
|
||
// Cancelled by user
|
||
return;
|
||
}
|
||
setError(err instanceof Error ? err.message : 'Unknown error');
|
||
} finally {
|
||
setIsProcessing(false);
|
||
setAbortController(null);
|
||
}
|
||
}, []);
|
||
|
||
const cancel = useCallback(() => {
|
||
abortController?.abort();
|
||
}, [abortController]);
|
||
|
||
return {
|
||
results,
|
||
isProcessing,
|
||
progress,
|
||
error,
|
||
enhance,
|
||
cancel
|
||
};
|
||
}
|
||
```
|
||
|
||
### UI Components
|
||
|
||
#### Suggestion Badge
|
||
|
||
```tsx
|
||
// components/ai/SuggestionBadge.tsx
|
||
|
||
import { Check, X, Sparkles } from 'lucide-react';
|
||
import { Button } from '@/components/ui/button';
|
||
import { cn } from '@/lib/utils';
|
||
|
||
interface SuggestionBadgeProps {
|
||
suggestion: string;
|
||
onAccept: () => void;
|
||
onDismiss: () => void;
|
||
className?: string;
|
||
}
|
||
|
||
export function SuggestionBadge({
|
||
suggestion,
|
||
onAccept,
|
||
onDismiss,
|
||
className
|
||
}: SuggestionBadgeProps) {
|
||
return (
|
||
<div className={cn(
|
||
'flex items-center gap-2 p-2 mt-1 rounded-md',
|
||
'bg-purple-50 border border-purple-200',
|
||
'dark:bg-purple-950/30 dark:border-purple-800',
|
||
className
|
||
)}>
|
||
<Sparkles className="h-3.5 w-3.5 text-purple-500 flex-shrink-0" />
|
||
<span className="text-sm text-purple-700 dark:text-purple-300 flex-1 truncate">
|
||
{suggestion}
|
||
</span>
|
||
<div className="flex items-center gap-1 flex-shrink-0">
|
||
<Button
|
||
size="sm"
|
||
variant="ghost"
|
||
className="h-6 w-6 p-0 text-green-600 hover:text-green-700 hover:bg-green-100"
|
||
onClick={onAccept}
|
||
>
|
||
<Check className="h-3.5 w-3.5" />
|
||
</Button>
|
||
<Button
|
||
size="sm"
|
||
variant="ghost"
|
||
className="h-6 w-6 p-0 text-gray-400 hover:text-gray-600 hover:bg-gray-100"
|
||
onClick={onDismiss}
|
||
>
|
||
<X className="h-3.5 w-3.5" />
|
||
</Button>
|
||
</div>
|
||
</div>
|
||
);
|
||
}
|
||
```
|
||
|
||
#### Category Suggestion Dropdown
|
||
|
||
```tsx
|
||
// components/ai/CategorySuggestionDropdown.tsx
|
||
|
||
import { useState, useEffect } from 'react';
|
||
import { Sparkles, ChevronDown } from 'lucide-react';
|
||
import {
|
||
DropdownMenu,
|
||
DropdownMenuContent,
|
||
DropdownMenuItem,
|
||
DropdownMenuSeparator,
|
||
DropdownMenuTrigger
|
||
} from '@/components/ui/dropdown-menu';
|
||
import { Button } from '@/components/ui/button';
|
||
import { Badge } from '@/components/ui/badge';
|
||
import { useCategorySuggestion } from '@/hooks/ai/useCategorySuggestion';
|
||
import { ProductRow } from '@/types/import';
|
||
|
||
interface CategorySuggestionDropdownProps {
|
||
product: ProductRow;
|
||
currentCategories: number[];
|
||
onSelect: (categoryId: number) => void;
|
||
allCategories: Array<{ id: number; name: string; fullPath: string }>;
|
||
}
|
||
|
||
export function CategorySuggestionDropdown({
|
||
product,
|
||
currentCategories,
|
||
onSelect,
|
||
allCategories
|
||
}: CategorySuggestionDropdownProps) {
|
||
const { suggestions, allMatches, isLoading, suggest, clear } = useCategorySuggestion();
|
||
const [isOpen, setIsOpen] = useState(false);
|
||
|
||
// Fetch suggestions when dropdown opens
|
||
useEffect(() => {
|
||
if (isOpen && suggestions.length === 0) {
|
||
suggest(product);
|
||
}
|
||
}, [isOpen, product, suggest, suggestions.length]);
|
||
|
||
// Clear when dropdown closes
|
||
useEffect(() => {
|
||
if (!isOpen) {
|
||
clear();
|
||
}
|
||
}, [isOpen, clear]);
|
||
|
||
return (
|
||
<DropdownMenu open={isOpen} onOpenChange={setIsOpen}>
|
||
<DropdownMenuTrigger asChild>
|
||
<Button variant="outline" size="sm" className="gap-2">
|
||
<span>Categories</span>
|
||
{currentCategories.length > 0 && (
|
||
<Badge variant="secondary" className="ml-1">
|
||
{currentCategories.length}
|
||
</Badge>
|
||
)}
|
||
<ChevronDown className="h-4 w-4" />
|
||
</Button>
|
||
</DropdownMenuTrigger>
|
||
|
||
<DropdownMenuContent className="w-80">
|
||
{/* AI Suggestions Section */}
|
||
{(isLoading || suggestions.length > 0) && (
|
||
<>
|
||
<div className="flex items-center gap-2 px-2 py-1.5 text-xs font-medium text-purple-600">
|
||
<Sparkles className="h-3 w-3" />
|
||
AI Suggested
|
||
{isLoading && <span className="text-gray-400">(loading...)</span>}
|
||
</div>
|
||
|
||
{suggestions.map(cat => (
|
||
<DropdownMenuItem
|
||
key={cat.id}
|
||
onClick={() => onSelect(cat.id)}
|
||
className="flex items-center justify-between"
|
||
>
|
||
<span className="truncate">{cat.fullPath}</span>
|
||
<Badge variant="outline" className="ml-2 text-xs">
|
||
{Math.round(cat.similarity * 100)}%
|
||
</Badge>
|
||
</DropdownMenuItem>
|
||
))}
|
||
|
||
<DropdownMenuSeparator />
|
||
</>
|
||
)}
|
||
|
||
{/* All Categories Section */}
|
||
<div className="px-2 py-1.5 text-xs font-medium text-gray-500">
|
||
All Categories
|
||
</div>
|
||
|
||
<div className="max-h-60 overflow-y-auto">
|
||
{allCategories.slice(0, 50).map(cat => (
|
||
<DropdownMenuItem
|
||
key={cat.id}
|
||
onClick={() => onSelect(cat.id)}
|
||
disabled={currentCategories.includes(cat.id)}
|
||
>
|
||
<span className="truncate">{cat.fullPath}</span>
|
||
</DropdownMenuItem>
|
||
))}
|
||
</div>
|
||
</DropdownMenuContent>
|
||
</DropdownMenu>
|
||
);
|
||
}
|
||
```
|
||
|
||
#### AI Validation Cell
|
||
|
||
```tsx
|
||
// components/ai/AiValidationCell.tsx
|
||
|
||
import { useState, useEffect } from 'react';
|
||
import { Loader2 } from 'lucide-react';
|
||
import { Input } from '@/components/ui/input';
|
||
import { SuggestionBadge } from './SuggestionBadge';
|
||
import { useNameSuggestion } from '@/hooks/ai/useNameSuggestion';
|
||
import { ProductRow } from '@/types/import';
|
||
|
||
interface AiValidationCellProps {
|
||
field: 'name' | 'description';
|
||
value: string;
|
||
product: ProductRow;
|
||
onChange: (value: string) => void;
|
||
onBlur?: () => void;
|
||
}
|
||
|
||
export function AiValidationCell({
|
||
field,
|
||
value,
|
||
product,
|
||
onChange,
|
||
onBlur
|
||
}: AiValidationCellProps) {
|
||
const {
|
||
suggestion,
|
||
isLoading,
|
||
suggest,
|
||
clear
|
||
} = useNameSuggestion();
|
||
|
||
const [localValue, setLocalValue] = useState(value);
|
||
const [showSuggestion, setShowSuggestion] = useState(false);
|
||
|
||
// Sync external value changes
|
||
useEffect(() => {
|
||
setLocalValue(value);
|
||
}, [value]);
|
||
|
||
// Show suggestion when it arrives and differs from current value
|
||
useEffect(() => {
|
||
if (suggestion && suggestion !== localValue) {
|
||
setShowSuggestion(true);
|
||
}
|
||
}, [suggestion, localValue]);
|
||
|
||
const handleBlur = () => {
|
||
// Trigger AI suggestion on blur
|
||
if (field === 'name') {
|
||
suggest({ ...product, name: localValue });
|
||
}
|
||
onBlur?.();
|
||
};
|
||
|
||
const handleAccept = () => {
|
||
if (suggestion) {
|
||
setLocalValue(suggestion);
|
||
onChange(suggestion);
|
||
setShowSuggestion(false);
|
||
clear();
|
||
}
|
||
};
|
||
|
||
const handleDismiss = () => {
|
||
setShowSuggestion(false);
|
||
clear();
|
||
};
|
||
|
||
return (
|
||
<div className="relative">
|
||
<div className="relative">
|
||
<Input
|
||
value={localValue}
|
||
onChange={(e) => {
|
||
setLocalValue(e.target.value);
|
||
onChange(e.target.value);
|
||
}}
|
||
onBlur={handleBlur}
|
||
className="pr-8"
|
||
/>
|
||
{isLoading && (
|
||
<div className="absolute right-2 top-1/2 -translate-y-1/2">
|
||
<Loader2 className="h-4 w-4 animate-spin text-purple-500" />
|
||
</div>
|
||
)}
|
||
</div>
|
||
|
||
{showSuggestion && suggestion && (
|
||
<SuggestionBadge
|
||
suggestion={suggestion}
|
||
onAccept={handleAccept}
|
||
onDismiss={handleDismiss}
|
||
/>
|
||
)}
|
||
</div>
|
||
);
|
||
}
|
||
```
|
||
|
||
#### Batch Enhancement Button
|
||
|
||
```tsx
|
||
// components/ai/EnhanceDescriptionsButton.tsx
|
||
|
||
import { useState } from 'react';
|
||
import { Sparkles, Loader2, CheckCircle } from 'lucide-react';
|
||
import { Button } from '@/components/ui/button';
|
||
import {
|
||
Dialog,
|
||
DialogContent,
|
||
DialogDescription,
|
||
DialogFooter,
|
||
DialogHeader,
|
||
DialogTitle
|
||
} from '@/components/ui/dialog';
|
||
import { Progress } from '@/components/ui/progress';
|
||
import { useDescriptionEnhancement } from '@/hooks/ai/useDescriptionEnhancement';
|
||
import { useValidationStore } from '../store/validationStore';
|
||
|
||
export function EnhanceDescriptionsButton() {
|
||
const [showDialog, setShowDialog] = useState(false);
|
||
const rows = useValidationStore(state => state.rows);
|
||
const updateRow = useValidationStore(state => state.updateRow);
|
||
|
||
const {
|
||
results,
|
||
isProcessing,
|
||
progress,
|
||
error,
|
||
enhance,
|
||
cancel
|
||
} = useDescriptionEnhancement();
|
||
|
||
const handleEnhance = async () => {
|
||
setShowDialog(true);
|
||
await enhance(rows.map(r => r.data));
|
||
};
|
||
|
||
const handleApply = () => {
|
||
// Apply enhanced descriptions to store
|
||
for (const result of results) {
|
||
if (result.changed && result.enhanced) {
|
||
const rowIndex = rows.findIndex(
|
||
r => r.data._index === result.productId || r.data.upc === result.productId
|
||
);
|
||
if (rowIndex !== -1) {
|
||
updateRow(rowIndex, { description: result.enhanced });
|
||
}
|
||
}
|
||
}
|
||
setShowDialog(false);
|
||
};
|
||
|
||
const changedCount = results.filter(r => r.changed).length;
|
||
const progressPercent = progress.total > 0
|
||
? Math.round((progress.current / progress.total) * 100)
|
||
: 0;
|
||
|
||
return (
|
||
<>
|
||
<Button
|
||
variant="outline"
|
||
onClick={handleEnhance}
|
||
disabled={isProcessing || rows.length === 0}
|
||
>
|
||
<Sparkles className="h-4 w-4 mr-2" />
|
||
Enhance Descriptions
|
||
</Button>
|
||
|
||
<Dialog open={showDialog} onOpenChange={setShowDialog}>
|
||
<DialogContent>
|
||
<DialogHeader>
|
||
<DialogTitle>Enhance Descriptions</DialogTitle>
|
||
<DialogDescription>
|
||
AI will improve product descriptions for SEO and clarity.
|
||
</DialogDescription>
|
||
</DialogHeader>
|
||
|
||
<div className="py-4">
|
||
{isProcessing ? (
|
||
<div className="space-y-4">
|
||
<div className="flex items-center gap-3">
|
||
<Loader2 className="h-5 w-5 animate-spin text-purple-500" />
|
||
<span>Processing {progress.current} of {progress.total} products...</span>
|
||
</div>
|
||
<Progress value={progressPercent} className="h-2" />
|
||
</div>
|
||
) : error ? (
|
||
<div className="text-red-500">{error}</div>
|
||
) : results.length > 0 ? (
|
||
<div className="space-y-2">
|
||
<div className="flex items-center gap-2 text-green-600">
|
||
<CheckCircle className="h-5 w-5" />
|
||
<span>Enhanced {changedCount} descriptions</span>
|
||
</div>
|
||
<p className="text-sm text-gray-500">
|
||
{results.length - changedCount} descriptions were already good or unchanged.
|
||
</p>
|
||
</div>
|
||
) : null}
|
||
</div>
|
||
|
||
<DialogFooter>
|
||
{isProcessing ? (
|
||
<Button variant="outline" onClick={cancel}>
|
||
Cancel
|
||
</Button>
|
||
) : results.length > 0 ? (
|
||
<>
|
||
<Button variant="outline" onClick={() => setShowDialog(false)}>
|
||
Discard
|
||
</Button>
|
||
<Button onClick={handleApply} disabled={changedCount === 0}>
|
||
Apply {changedCount} Changes
|
||
</Button>
|
||
</>
|
||
) : null}
|
||
</DialogFooter>
|
||
</DialogContent>
|
||
</Dialog>
|
||
</>
|
||
);
|
||
}
|
||
```
|
||
|
||
### Integration with ValidationStep
|
||
|
||
```tsx
|
||
// Example integration in ValidationContainer.tsx
|
||
|
||
import { AiValidationCell } from '@/components/ai/AiValidationCell';
|
||
import { CategorySuggestionDropdown } from '@/components/ai/CategorySuggestionDropdown';
|
||
import { EnhanceDescriptionsButton } from '@/components/ai/EnhanceDescriptionsButton';
|
||
|
||
// In the toolbar
|
||
<div className="flex items-center gap-2">
|
||
<EnhanceDescriptionsButton />
|
||
{/* Other toolbar items */}
|
||
</div>
|
||
|
||
// In the data grid column definitions
|
||
const columns = [
|
||
{
|
||
key: 'name',
|
||
name: 'Name',
|
||
renderCell: ({ row, onRowChange }) => (
|
||
<AiValidationCell
|
||
field="name"
|
||
value={row.name}
|
||
product={row}
|
||
onChange={(value) => onRowChange({ ...row, name: value })}
|
||
/>
|
||
)
|
||
},
|
||
{
|
||
key: 'categories',
|
||
name: 'Categories',
|
||
renderCell: ({ row, onRowChange }) => (
|
||
<CategorySuggestionDropdown
|
||
product={row}
|
||
currentCategories={parseCategories(row.categories)}
|
||
onSelect={(catId) => {
|
||
const current = parseCategories(row.categories);
|
||
onRowChange({
|
||
...row,
|
||
categories: [...current, catId].join(',')
|
||
});
|
||
}}
|
||
allCategories={allCategories}
|
||
/>
|
||
)
|
||
}
|
||
];
|
||
```
|
||
|
||
---
|
||
|
||
## Database Schema
|
||
|
||
### New Tables
|
||
|
||
```sql
|
||
-- Embedding cache for faster startup
|
||
CREATE TABLE IF NOT EXISTS ai_embedding_cache (
|
||
id SERIAL PRIMARY KEY,
|
||
entity_type VARCHAR(50) NOT NULL, -- 'category', 'theme', 'color'
|
||
entity_id INTEGER NOT NULL,
|
||
embedding_model VARCHAR(100) NOT NULL,
|
||
embedding VECTOR(1536), -- Using pgvector extension
|
||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||
UNIQUE(entity_type, entity_id, embedding_model)
|
||
);
|
||
|
||
-- Index for fast lookups
|
||
CREATE INDEX idx_embedding_cache_lookup
|
||
ON ai_embedding_cache(entity_type, embedding_model);
|
||
|
||
-- AI suggestion history (for analytics and improvement)
|
||
CREATE TABLE IF NOT EXISTS ai_suggestion_log (
|
||
id SERIAL PRIMARY KEY,
|
||
task_id VARCHAR(100) NOT NULL,
|
||
product_identifier VARCHAR(255),
|
||
suggestion JSONB,
|
||
accepted BOOLEAN DEFAULT NULL,
|
||
latency_ms INTEGER,
|
||
token_usage JSONB,
|
||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||
);
|
||
|
||
-- Index for analytics queries
|
||
CREATE INDEX idx_suggestion_log_task
|
||
ON ai_suggestion_log(task_id, created_at);
|
||
```
|
||
|
||
### Configuration Table Updates
|
||
|
||
```sql
|
||
-- Add AI configuration to existing config or create new
|
||
CREATE TABLE IF NOT EXISTS ai_config (
|
||
id SERIAL PRIMARY KEY,
|
||
key VARCHAR(100) UNIQUE NOT NULL,
|
||
value JSONB NOT NULL,
|
||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||
);
|
||
|
||
-- Insert default config
|
||
INSERT INTO ai_config (key, value) VALUES
|
||
('providers', '{
|
||
"groq": {
|
||
"enabled": true,
|
||
"model": "llama-3.3-70b-versatile"
|
||
},
|
||
"openai": {
|
||
"enabled": true,
|
||
"embeddingModel": "text-embedding-3-small"
|
||
},
|
||
"anthropic": {
|
||
"enabled": true,
|
||
"model": "claude-3-5-haiku-20241022"
|
||
}
|
||
}'),
|
||
('tasks', '{
|
||
"nameSuggestion": {
|
||
"model": "llama-3.3-70b-versatile",
|
||
"temperature": 0.2,
|
||
"maxTokens": 150
|
||
},
|
||
"categorySuggestion": {
|
||
"model": "llama-3.1-8b-instant",
|
||
"temperature": 0.1,
|
||
"maxTokens": 100
|
||
},
|
||
"descriptionEnhance": {
|
||
"model": "claude-3-5-haiku-20241022",
|
||
"temperature": 0.7,
|
||
"maxTokens": 2000,
|
||
"batchSize": 10
|
||
}
|
||
}');
|
||
```
|
||
|
||
---
|
||
|
||
## Migration Strategy
|
||
|
||
### Phase 1: Add New System Alongside Old (Week 1-2)
|
||
|
||
1. Implement `services/ai/` structure
|
||
2. Add new `/api/ai/*` routes
|
||
3. Keep old `/api/ai-validation/*` routes working
|
||
4. Add feature flag to enable new system per-user
|
||
|
||
### Phase 2: Frontend Integration (Week 2-3)
|
||
|
||
1. Add suggestion hooks
|
||
2. Integrate `AiValidationCell` for name field first
|
||
3. Add `CategorySuggestionDropdown`
|
||
4. Add `EnhanceDescriptionsButton` for batch descriptions
|
||
|
||
### Phase 3: Replace Batch Validation (Week 3-4)
|
||
|
||
1. Update "AI Validate" button to use new tiered system
|
||
2. Remove old giant-prompt approach
|
||
3. Keep old endpoint as deprecated fallback
|
||
4. Monitor costs and latency
|
||
|
||
### Phase 4: Cleanup (Week 4+)
|
||
|
||
1. Remove old `ai-validation.js` route
|
||
2. Remove old frontend components
|
||
3. Archive old prompts
|
||
4. Document new system
|
||
|
||
---
|
||
|
||
## Cost Analysis
|
||
|
||
### Current Costs (GPT-5.2 Reasoning)
|
||
|
||
| Metric | Value |
|
||
|--------|-------|
|
||
| Input tokens (est.) | ~25,000 |
|
||
| Output tokens (est.) | ~5,000 |
|
||
| Cost per run | $0.30-0.50 |
|
||
| Runs per day (est.) | 20 |
|
||
| **Monthly cost** | **~$200-300** |
|
||
|
||
### Projected Costs (New System)
|
||
|
||
| Tier | Model | Cost per call | Calls per product | Cost per product |
|
||
|------|-------|---------------|-------------------|------------------|
|
||
| Tier 1 | Code | $0 | N/A | $0 |
|
||
| Tier 2 | Groq Llama 3.3 70B | ~$0.0003 | 3-5 | ~$0.001 |
|
||
| Tier 2 | OpenAI Embeddings | ~$0.00002 | 1 | ~$0.00002 |
|
||
| Tier 3 | Claude Haiku | ~$0.001 | 0.1 (batch) | ~$0.0001 |
|
||
|
||
**Per 50 products:**
|
||
- Tier 1: $0
|
||
- Tier 2: ~$0.05 (if all fields suggested)
|
||
- Tier 3: ~$0.01 (description batch)
|
||
- **Total: ~$0.06**
|
||
|
||
**Monthly projection (20 runs/day × 50 products):**
|
||
- Current: $200-300
|
||
- New: ~$36
|
||
- **Savings: 80-90%**
|
||
|
||
---
|
||
|
||
## Next Steps
|
||
|
||
1. **Review this document** - Confirm approach aligns with expectations
|
||
2. **Set up Groq account** - Get API key for real-time inference
|
||
3. **Implement providers** - Start with Groq + OpenAI
|
||
4. **Build embedding system** - Pre-compute category embeddings
|
||
5. **Create first task** - Start with name suggestion
|
||
6. **Integrate frontend** - Add suggestion badge to name field
|
||
7. **Iterate** - Add more tasks based on feedback
|
||
|
||
---
|
||
|
||
## Appendix: Key Files Reference
|
||
|
||
### Email App AI System (Reference)
|
||
- `/Users/matt/Dev/email/email-server/services/ai/index.js` - Main entry
|
||
- `/Users/matt/Dev/email/email-server/services/ai/providers/groqProvider.js` - Groq implementation
|
||
- `/Users/matt/Dev/email/email-server/services/ai/taskRegistry.js` - Task system
|
||
- `/Users/matt/Dev/email/email-server/services/ai/workQueue.js` - Concurrency
|
||
|
||
### Current Inventory AI System
|
||
- `/inventory-server/src/routes/ai-validation.js` - Current monolithic implementation
|
||
- `/inventory/src/components/product-import/steps/ValidationStep/hooks/useAiValidation/` - Current frontend hooks
|