209 lines
5.3 KiB
Markdown
209 lines
5.3 KiB
Markdown
# AI Error Handling Guide
|
|
|
|
## Error Types
|
|
|
|
The AI system classifies errors into these categories:
|
|
|
|
| Error Type | Cause | User Message | Retryable |
|
|
|------------|-------|--------------|-----------|
|
|
| `rate_limit` | Too many requests | "Rate limit exceeded. Wait a few minutes." | Yes |
|
|
| `quota_exceeded` | Billing limit | "API quota exceeded. Check billing." | No |
|
|
| `model_not_found` | Invalid model | "Model not available. Check settings." | No |
|
|
| `invalid_api_key` | Bad API key | "Invalid API key. Check settings." | No |
|
|
| `context_length` | Prompt too large | "Request too large. Try fewer items." | Yes* |
|
|
| `parse_error` | AI returned invalid JSON | "Response parse error. Flagged for review." | Yes |
|
|
| `timeout` | Request took too long | "Request timed out. Try again." | Yes |
|
|
| `network_error` | Connection issue | "Network error. Check connection." | Yes |
|
|
| `content_filter` | Content blocked | "Content filtered. Check input data." | No |
|
|
| `server_error` | OpenAI server issue | "Server error. Try again later." | Yes |
|
|
|
|
*Context length errors can be retried with smaller batches.
|
|
|
|
## Error Classification
|
|
|
|
```typescript
|
|
import { classifyAIError, shouldRetry, getRetryDelay } from '@/server/services/ai-errors'
|
|
|
|
try {
|
|
const response = await openai.chat.completions.create(params)
|
|
} catch (error) {
|
|
const classified = classifyAIError(error)
|
|
|
|
console.error(`AI Error: ${classified.type} - ${classified.message}`)
|
|
|
|
if (shouldRetry(classified.type)) {
|
|
const delay = getRetryDelay(classified.type)
|
|
// Wait and retry
|
|
} else {
|
|
// Fall back to algorithm
|
|
}
|
|
}
|
|
```
|
|
|
|
## Graceful Degradation
|
|
|
|
When AI fails, the platform automatically handles it:
|
|
|
|
### AI Assignment
|
|
1. Logs the error
|
|
2. Falls back to algorithmic assignment:
|
|
- Matches by expertise tag overlap
|
|
- Balances workload across jurors
|
|
- Respects constraints (max assignments)
|
|
|
|
### AI Filtering
|
|
1. Logs the error
|
|
2. Flags all projects for manual review
|
|
3. Returns error message to admin
|
|
|
|
### Award Eligibility
|
|
1. Logs the error
|
|
2. Returns all projects as "needs manual review"
|
|
3. Admin can apply deterministic rules instead
|
|
|
|
### Mentor Matching
|
|
1. Logs the error
|
|
2. Falls back to keyword-based matching
|
|
3. Uses availability scoring
|
|
|
|
## Retry Strategy
|
|
|
|
| Error Type | Retry Count | Delay |
|
|
|------------|-------------|-------|
|
|
| `rate_limit` | 3 | Exponential (1s, 2s, 4s) |
|
|
| `timeout` | 2 | Fixed 5s |
|
|
| `network_error` | 3 | Exponential (1s, 2s, 4s) |
|
|
| `server_error` | 3 | Exponential (2s, 4s, 8s) |
|
|
| `parse_error` | 1 | None |
|
|
|
|
## Monitoring
|
|
|
|
### Error Logging
|
|
|
|
All AI errors are logged to:
|
|
1. Console (development)
|
|
2. `AIUsageLog` table with `status: 'ERROR'`
|
|
3. `AuditLog` for security-relevant failures
|
|
|
|
### Checking Errors
|
|
|
|
```sql
|
|
-- Recent AI errors
|
|
SELECT
|
|
created_at,
|
|
action,
|
|
model,
|
|
error_message
|
|
FROM ai_usage_log
|
|
WHERE status = 'ERROR'
|
|
ORDER BY created_at DESC
|
|
LIMIT 20;
|
|
|
|
-- Error rate by action
|
|
SELECT
|
|
action,
|
|
COUNT(*) FILTER (WHERE status = 'ERROR') as errors,
|
|
COUNT(*) as total,
|
|
ROUND(100.0 * COUNT(*) FILTER (WHERE status = 'ERROR') / COUNT(*), 2) as error_rate
|
|
FROM ai_usage_log
|
|
GROUP BY action;
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### High Error Rate
|
|
|
|
1. Check OpenAI status page for outages
|
|
2. Verify API key is valid and not rate-limited
|
|
3. Review error messages in logs
|
|
4. Consider switching to a different model
|
|
|
|
### Consistent Parse Errors
|
|
|
|
1. The AI model may be returning malformed JSON
|
|
2. Try a more capable model (gpt-4o instead of gpt-3.5)
|
|
3. Check if prompts are being truncated
|
|
4. Review recent responses in logs
|
|
|
|
### All Requests Failing
|
|
|
|
1. Test connection in Settings → AI
|
|
2. Verify API key hasn't been revoked
|
|
3. Check billing status in OpenAI dashboard
|
|
4. Review network connectivity
|
|
|
|
### Slow Responses
|
|
|
|
1. Consider using gpt-4o-mini for speed
|
|
2. Reduce batch sizes
|
|
3. Check for rate limiting (429 errors)
|
|
4. Monitor OpenAI latency
|
|
|
|
## Error Response Format
|
|
|
|
When errors occur, services return structured responses:
|
|
|
|
```typescript
|
|
// AI Assignment error response
|
|
{
|
|
success: false,
|
|
suggestions: [],
|
|
error: "Rate limit exceeded. Wait a few minutes and try again.",
|
|
fallbackUsed: true,
|
|
}
|
|
|
|
// AI Filtering error response
|
|
{
|
|
projectId: "...",
|
|
meetsCriteria: false,
|
|
confidence: 0,
|
|
reasoning: "AI error: Rate limit exceeded",
|
|
flagForReview: true,
|
|
}
|
|
```
|
|
|
|
## Implementing Custom Error Handling
|
|
|
|
```typescript
|
|
import {
|
|
classifyAIError,
|
|
shouldRetry,
|
|
getRetryDelay,
|
|
getUserFriendlyMessage,
|
|
logAIError,
|
|
} from '@/server/services/ai-errors'
|
|
|
|
async function callAIWithRetry<T>(
|
|
operation: () => Promise<T>,
|
|
serviceName: string,
|
|
maxRetries: number = 3
|
|
): Promise<T> {
|
|
let lastError: Error | null = null
|
|
|
|
for (let attempt = 1; attempt <= maxRetries; attempt++) {
|
|
try {
|
|
return await operation()
|
|
} catch (error) {
|
|
const classified = classifyAIError(error)
|
|
logAIError(serviceName, 'operation', classified)
|
|
|
|
if (!shouldRetry(classified.type) || attempt === maxRetries) {
|
|
throw new Error(getUserFriendlyMessage(classified.type))
|
|
}
|
|
|
|
const delay = getRetryDelay(classified.type) * attempt
|
|
await new Promise(resolve => setTimeout(resolve, delay))
|
|
lastError = error as Error
|
|
}
|
|
}
|
|
|
|
throw lastError
|
|
}
|
|
```
|
|
|
|
## See Also
|
|
|
|
- [AI System Architecture](./ai-system.md)
|
|
- [AI Configuration Guide](./ai-configuration.md)
|
|
- [AI Services Reference](./ai-services.md)
|