MOPC-App/docs/architecture/ai-errors.md

# AI Error Handling Guide

## Error Types

The AI system classifies errors into these categories:

| Error Type | Cause | User Message | Retryable |
|------------|-------|--------------|-----------|
| `rate_limit` | Too many requests | "Rate limit exceeded. Wait a few minutes." | Yes |
| `quota_exceeded` | Billing limit | "API quota exceeded. Check billing." | No |
| `model_not_found` | Invalid model | "Model not available. Check settings." | No |
| `invalid_api_key` | Bad API key | "Invalid API key. Check settings." | No |
| `context_length` | Prompt too large | "Request too large. Try fewer items." | Yes* |
| `parse_error` | AI returned invalid JSON | "Response parse error. Flagged for review." | Yes |
| `timeout` | Request took too long | "Request timed out. Try again." | Yes |
| `network_error` | Connection issue | "Network error. Check connection." | Yes |
| `content_filter` | Content blocked | "Content filtered. Check input data." | No |
| `server_error` | OpenAI server issue | "Server error. Try again later." | Yes |

*Context length errors can be retried with smaller batches.

## Error Classification

```typescript
import { classifyAIError, shouldRetry, getRetryDelay } from '@/server/services/ai-errors'

try {
  const response = await openai.chat.completions.create(params)
} catch (error) {
  const classified = classifyAIError(error)

  console.error(`AI Error: ${classified.type} - ${classified.message}`)

  if (shouldRetry(classified.type)) {
    const delay = getRetryDelay(classified.type)
    // Wait and retry
  } else {
    // Fall back to algorithm
  }
}
```

## Graceful Degradation

When AI fails, the platform automatically handles it:

### AI Assignment
1. Logs the error
2. Falls back to algorithmic assignment:
   - Matches by expertise tag overlap
   - Balances workload across jurors
   - Respects constraints (max assignments)

### AI Filtering
1. Logs the error
2. Flags all projects for manual review
3. Returns error message to admin

### Award Eligibility
1. Logs the error
2. Returns all projects as "needs manual review"
3. Admin can apply deterministic rules instead

### Mentor Matching
1. Logs the error
2. Falls back to keyword-based matching
3. Uses availability scoring

## Retry Strategy

| Error Type | Retry Count | Delay |
|------------|-------------|-------|
| `rate_limit` | 3 | Exponential (1s, 2s, 4s) |
| `timeout` | 2 | Fixed 5s |
| `network_error` | 3 | Exponential (1s, 2s, 4s) |
| `server_error` | 3 | Exponential (2s, 4s, 8s) |
| `parse_error` | 1 | None |

## Monitoring

### Error Logging

All AI errors are logged to:
1. Console (development)
2. `AIUsageLog` table with `status: 'ERROR'`
3. `AuditLog` for security-relevant failures

### Checking Errors

```sql
-- Recent AI errors
SELECT
  created_at,
  action,
  model,
  error_message
FROM ai_usage_log
WHERE status = 'ERROR'
ORDER BY created_at DESC
LIMIT 20;

-- Error rate by action
SELECT
  action,
  COUNT(*) FILTER (WHERE status = 'ERROR') as errors,
  COUNT(*) as total,
  ROUND(100.0 * COUNT(*) FILTER (WHERE status = 'ERROR') / COUNT(*), 2) as error_rate
FROM ai_usage_log
GROUP BY action;
```

## Troubleshooting

### High Error Rate

1. Check OpenAI status page for outages
2. Verify API key is valid and not rate-limited
3. Review error messages in logs
4. Consider switching to a different model

### Consistent Parse Errors

1. The AI model may be returning malformed JSON
2. Try a more capable model (gpt-4o instead of gpt-3.5)
3. Check if prompts are being truncated
4. Review recent responses in logs

### All Requests Failing

1. Test connection in Settings → AI
2. Verify API key hasn't been revoked
3. Check billing status in OpenAI dashboard
4. Review network connectivity

### Slow Responses

1. Consider using gpt-4o-mini for speed
2. Reduce batch sizes
3. Check for rate limiting (429 errors)
4. Monitor OpenAI latency

## Error Response Format

When errors occur, services return structured responses:

```typescript
// AI Assignment error response
{
  success: false,
  suggestions: [],
  error: "Rate limit exceeded. Wait a few minutes and try again.",
  fallbackUsed: true,
}

// AI Filtering error response
{
  projectId: "...",
  meetsCriteria: false,
  confidence: 0,
  reasoning: "AI error: Rate limit exceeded",
  flagForReview: true,
}
```

## Implementing Custom Error Handling

```typescript
import {
  classifyAIError,
  shouldRetry,
  getRetryDelay,
  getUserFriendlyMessage,
  logAIError,
} from '@/server/services/ai-errors'

async function callAIWithRetry<T>(
  operation: () => Promise<T>,
  serviceName: string,
  maxRetries: number = 3
): Promise<T> {
  let lastError: Error | null = null

  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      return await operation()
    } catch (error) {
      const classified = classifyAIError(error)
      logAIError(serviceName, 'operation', classified)

      if (!shouldRetry(classified.type) || attempt === maxRetries) {
        throw new Error(getUserFriendlyMessage(classified.type))
      }

      const delay = getRetryDelay(classified.type) * attempt
      await new Promise(resolve => setTimeout(resolve, delay))
      lastError = error as Error
    }
  }

  throw lastError
}
```

## See Also

- [AI System Architecture](./ai-system.md)
- [AI Configuration Guide](./ai-configuration.md)
- [AI Services Reference](./ai-services.md)