MOPC-App/docs/architecture/ai-errors.md

5.3 KiB

AI Error Handling Guide

Error Types

The AI system classifies errors into these categories:

Error Type Cause User Message Retryable
rate_limit Too many requests "Rate limit exceeded. Wait a few minutes." Yes
quota_exceeded Billing limit "API quota exceeded. Check billing." No
model_not_found Invalid model "Model not available. Check settings." No
invalid_api_key Bad API key "Invalid API key. Check settings." No
context_length Prompt too large "Request too large. Try fewer items." Yes*
parse_error AI returned invalid JSON "Response parse error. Flagged for review." Yes
timeout Request took too long "Request timed out. Try again." Yes
network_error Connection issue "Network error. Check connection." Yes
content_filter Content blocked "Content filtered. Check input data." No
server_error OpenAI server issue "Server error. Try again later." Yes

*Context length errors can be retried with smaller batches.

Error Classification

import { classifyAIError, shouldRetry, getRetryDelay } from '@/server/services/ai-errors'

try {
  const response = await openai.chat.completions.create(params)
} catch (error) {
  const classified = classifyAIError(error)

  console.error(`AI Error: ${classified.type} - ${classified.message}`)

  if (shouldRetry(classified.type)) {
    const delay = getRetryDelay(classified.type)
    // Wait and retry
  } else {
    // Fall back to algorithm
  }
}

Graceful Degradation

When AI fails, the platform automatically handles it:

AI Assignment

  1. Logs the error
  2. Falls back to algorithmic assignment:
    • Matches by expertise tag overlap
    • Balances workload across jurors
    • Respects constraints (max assignments)

AI Filtering

  1. Logs the error
  2. Flags all projects for manual review
  3. Returns error message to admin

Award Eligibility

  1. Logs the error
  2. Returns all projects as "needs manual review"
  3. Admin can apply deterministic rules instead

Mentor Matching

  1. Logs the error
  2. Falls back to keyword-based matching
  3. Uses availability scoring

Retry Strategy

Error Type Retry Count Delay
rate_limit 3 Exponential (1s, 2s, 4s)
timeout 2 Fixed 5s
network_error 3 Exponential (1s, 2s, 4s)
server_error 3 Exponential (2s, 4s, 8s)
parse_error 1 None

Monitoring

Error Logging

All AI errors are logged to:

  1. Console (development)
  2. AIUsageLog table with status: 'ERROR'
  3. AuditLog for security-relevant failures

Checking Errors

-- Recent AI errors
SELECT
  created_at,
  action,
  model,
  error_message
FROM ai_usage_log
WHERE status = 'ERROR'
ORDER BY created_at DESC
LIMIT 20;

-- Error rate by action
SELECT
  action,
  COUNT(*) FILTER (WHERE status = 'ERROR') as errors,
  COUNT(*) as total,
  ROUND(100.0 * COUNT(*) FILTER (WHERE status = 'ERROR') / COUNT(*), 2) as error_rate
FROM ai_usage_log
GROUP BY action;

Troubleshooting

High Error Rate

  1. Check OpenAI status page for outages
  2. Verify API key is valid and not rate-limited
  3. Review error messages in logs
  4. Consider switching to a different model

Consistent Parse Errors

  1. The AI model may be returning malformed JSON
  2. Try a more capable model (gpt-4o instead of gpt-3.5)
  3. Check if prompts are being truncated
  4. Review recent responses in logs

All Requests Failing

  1. Test connection in Settings → AI
  2. Verify API key hasn't been revoked
  3. Check billing status in OpenAI dashboard
  4. Review network connectivity

Slow Responses

  1. Consider using gpt-4o-mini for speed
  2. Reduce batch sizes
  3. Check for rate limiting (429 errors)
  4. Monitor OpenAI latency

Error Response Format

When errors occur, services return structured responses:

// AI Assignment error response
{
  success: false,
  suggestions: [],
  error: "Rate limit exceeded. Wait a few minutes and try again.",
  fallbackUsed: true,
}

// AI Filtering error response
{
  projectId: "...",
  meetsCriteria: false,
  confidence: 0,
  reasoning: "AI error: Rate limit exceeded",
  flagForReview: true,
}

Implementing Custom Error Handling

import {
  classifyAIError,
  shouldRetry,
  getRetryDelay,
  getUserFriendlyMessage,
  logAIError,
} from '@/server/services/ai-errors'

async function callAIWithRetry<T>(
  operation: () => Promise<T>,
  serviceName: string,
  maxRetries: number = 3
): Promise<T> {
  let lastError: Error | null = null

  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      return await operation()
    } catch (error) {
      const classified = classifyAIError(error)
      logAIError(serviceName, 'operation', classified)

      if (!shouldRetry(classified.type) || attempt === maxRetries) {
        throw new Error(getUserFriendlyMessage(classified.type))
      }

      const delay = getRetryDelay(classified.type) * attempt
      await new Promise(resolve => setTimeout(resolve, delay))
      lastError = error as Error
    }
  }

  throw lastError
}

See Also