286 lines
9.0 KiB
Markdown
286 lines
9.0 KiB
Markdown
# 502 Error Fixes Implementation
|
|
|
|
This document outlines the comprehensive fixes implemented to eliminate 502 errors during authentication, particularly during initial login redirection.
|
|
|
|
## Problem Analysis
|
|
|
|
The 502 errors were occurring due to:
|
|
1. **Authentication flow bottlenecks** - Sequential external API calls to Keycloak without retry logic
|
|
2. **Nginx timeout issues** - Generic proxy settings not optimized for auth operations
|
|
3. **No connection pooling** - Each request created new connections to Keycloak
|
|
4. **Lack of circuit breaker** - Failed requests could cascade and overwhelm the system
|
|
5. **No error resilience** - Single failures caused complete authentication breakdown
|
|
|
|
## Solution Overview
|
|
|
|
### 1. Nginx Configuration Optimizations
|
|
|
|
**File**: Updated nginx server configuration
|
|
|
|
**Changes**:
|
|
- **Specific auth route handling**: Extended timeouts (60s) for auth callbacks
|
|
- **Disabled retries** on auth routes to prevent duplicate authentication requests
|
|
- **Custom error pages**: 502.html with auto-retry functionality
|
|
- **WebSocket support**: Proper upgrade handling for real-time features
|
|
- **Better logging**: Detailed timing information for debugging
|
|
- **Security headers**: Standard security best practices
|
|
|
|
**Key Settings**:
|
|
```nginx
|
|
# Authentication routes - require special handling
|
|
location ~ ^/api/auth/(keycloak/callback|session|refresh) {
|
|
proxy_connect_timeout 30s;
|
|
proxy_send_timeout 60s;
|
|
proxy_read_timeout 60s;
|
|
proxy_buffering off;
|
|
proxy_next_upstream off; # No retries for auth
|
|
}
|
|
```
|
|
|
|
### 2. Keycloak HTTP Client with Circuit Breaker
|
|
|
|
**File**: `server/utils/keycloak-client.ts`
|
|
|
|
**Features**:
|
|
- **Circuit breaker pattern**: Prevents cascade failures
|
|
- **Exponential backoff**: Intelligent retry logic
|
|
- **Connection pooling**: Reuses HTTP connections
|
|
- **Timeout management**: Configurable timeouts per operation
|
|
- **Performance monitoring**: Detailed timing and failure tracking
|
|
|
|
**Key Implementation**:
|
|
```typescript
|
|
class KeycloakClient {
|
|
private circuitBreaker: CircuitBreakerState
|
|
private readonly maxFailures = 5
|
|
private readonly resetTimeout = 60000 // 1 minute
|
|
|
|
async fetch(url: string, options: any = {}, clientOptions: KeycloakClientOptions = {}) {
|
|
// Circuit breaker check
|
|
// Retry logic with exponential backoff
|
|
// Connection reuse headers
|
|
// Performance timing
|
|
}
|
|
}
|
|
```
|
|
|
|
### 3. Enhanced Authentication Callback
|
|
|
|
**File**: `server/api/auth/keycloak/callback.ts`
|
|
|
|
**Improvements**:
|
|
- **Uses new Keycloak client** with retry logic
|
|
- **Performance timing** for each operation
|
|
- **Better error handling** with specific error types
|
|
- **Circuit breaker monitoring** for debugging
|
|
- **Request ID tracking** for correlation
|
|
|
|
**Before/After**:
|
|
```typescript
|
|
// BEFORE: Direct $fetch calls
|
|
const tokenResponse = await $fetch('https://auth.portnimara.dev/...', {...})
|
|
|
|
// AFTER: Resilient client with retries
|
|
const tokenResponse = await keycloakClient.exchangeCodeForTokens(code, redirectUri)
|
|
```
|
|
|
|
### 4. Improved Token Refresh
|
|
|
|
**File**: `server/api/auth/refresh.ts`
|
|
|
|
**Changes**:
|
|
- **Uses Keycloak client** for retry logic
|
|
- **Performance monitoring** with timing
|
|
- **Better error handling** for network issues
|
|
- **Maintains session state** during failures
|
|
|
|
### 5. Enhanced Login Error Handling
|
|
|
|
**File**: `pages/login.vue`
|
|
|
|
**Features**:
|
|
- **Specific error messages** for different failure types
|
|
- **User-friendly messaging** instead of generic errors
|
|
- **Clear next steps** for users
|
|
|
|
**Error Types**:
|
|
- `service_unavailable`: Temporary service issues
|
|
- `server_error`: Server-side problems
|
|
- `access_denied`: Authorization failures
|
|
- `auth_failed`: General authentication failures
|
|
|
|
### 6. Application Readiness Checks
|
|
|
|
**File**: `plugins/00.startup-check.server.ts`
|
|
|
|
**Features**:
|
|
- **Environment validation** at startup
|
|
- **Keycloak client initialization** and warmup
|
|
- **Circuit breaker status** monitoring
|
|
- **Readiness tracking** for health checks
|
|
|
|
### 7. Enhanced Health Endpoint
|
|
|
|
**File**: `server/api/health.ts`
|
|
|
|
**Information**:
|
|
- **Application readiness** status
|
|
- **Circuit breaker state** for monitoring
|
|
- **Authentication configuration** validation
|
|
- **Performance metrics** for debugging
|
|
|
|
## Key Benefits
|
|
|
|
### 1. **Resilience**
|
|
- Circuit breaker prevents cascade failures
|
|
- Retry logic handles temporary network issues
|
|
- Graceful degradation during service outages
|
|
|
|
### 2. **Performance**
|
|
- Connection pooling reduces overhead
|
|
- Optimized timeouts prevent unnecessary delays
|
|
- Better resource utilization
|
|
|
|
### 3. **Monitoring**
|
|
- Detailed logging for debugging
|
|
- Performance timing for optimization
|
|
- Circuit breaker metrics for alerting
|
|
|
|
### 4. **User Experience**
|
|
- Specific error messages
|
|
- Auto-retry functionality
|
|
- Reduced failed login attempts
|
|
|
|
## Configuration Requirements
|
|
|
|
### Environment Variables
|
|
```bash
|
|
KEYCLOAK_CLIENT_SECRET=your_client_secret
|
|
COOKIE_DOMAIN=.portnimara.dev
|
|
```
|
|
|
|
### Nginx Configuration
|
|
- Apply the optimized nginx configuration
|
|
- Create `/usr/share/nginx/html/502.html` error page
|
|
- Ensure `map` directive is in HTTP context
|
|
|
|
## Monitoring and Debugging
|
|
|
|
### Health Check
|
|
```bash
|
|
curl https://client.portnimara.dev/api/health
|
|
```
|
|
|
|
### Circuit Breaker Status
|
|
Check the health endpoint for:
|
|
```json
|
|
{
|
|
"readiness": {
|
|
"keycloakCircuitBreaker": {
|
|
"isOpen": false,
|
|
"failures": 0,
|
|
"lastFailure": null
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Log Monitoring
|
|
Look for these log patterns:
|
|
- `[KEYCLOAK_CLIENT]` - Client operations and circuit breaker
|
|
- `[KEYCLOAK]` - Authentication flow timing
|
|
- `[STARTUP]` - Application initialization
|
|
|
|
## Testing
|
|
|
|
### Verify the fixes:
|
|
1. **Normal login flow** - Should complete without 502 errors
|
|
2. **Retry during network issues** - Should recover automatically
|
|
3. **Circuit breaker activation** - Should prevent cascade failures
|
|
4. **Error handling** - Should show appropriate user messages
|
|
|
|
### Load testing:
|
|
- Multiple concurrent login attempts
|
|
- Network latency simulation
|
|
- Keycloak service interruption testing
|
|
|
|
## Rollback Plan
|
|
|
|
If issues occur:
|
|
1. **Revert nginx configuration** to original
|
|
2. **Remove new files**: `server/utils/keycloak-client.ts`
|
|
3. **Restore original callback handler**
|
|
4. **Restart application services**
|
|
|
|
## Future Improvements
|
|
|
|
1. **Caching**: Add user info caching to reduce API calls
|
|
2. **Metrics**: Implement Prometheus metrics collection
|
|
3. **Alerts**: Set up monitoring alerts for circuit breaker
|
|
4. **Testing**: Add automated integration tests for auth flow
|
|
|
|
## Post-Implementation Fixes
|
|
|
|
After the initial implementation, additional issues were discovered and resolved:
|
|
|
|
### Issue: Keycloak Client Compatibility
|
|
**Problem**: The enhanced keycloak-client.ts with custom headers was incompatible with Nitro/Nuxt $fetch, causing immediate fetch failures.
|
|
|
|
**Solution**: Simplified the client by removing problematic headers:
|
|
- Removed `Connection: keep-alive` and `Keep-Alive` headers
|
|
- Removed custom timeout implementation
|
|
- Kept retry logic and circuit breaker functionality
|
|
|
|
### Issue: Background Task Authentication
|
|
**Problem**: Background tasks (like `process-sales-emails`) were failing with 401 errors because they don't have user sessions.
|
|
|
|
**Solution**: Enhanced `server/utils/auth.ts` to support internal authentication:
|
|
- Added support for `x-tag: 094ut234` header for system tasks
|
|
- Added localhost detection for internal calls
|
|
- Added optional `INTERNAL_API_SECRET` environment variable support
|
|
|
|
### Issue: Network Diagnostics
|
|
**Problem**: Difficult to diagnose Docker networking issues with Keycloak connectivity.
|
|
|
|
**Solution**: Added diagnostic endpoint:
|
|
- `/api/debug/test-keycloak-connectivity` - Tests basic connectivity to Keycloak from within container
|
|
|
|
## Updated Files Summary
|
|
|
|
**New Files**:
|
|
- `server/utils/keycloak-client.ts` - Resilient HTTP client (simplified version)
|
|
- `server/api/debug/test-keycloak-connectivity.ts` - Connectivity diagnostic tool
|
|
- `docs/502-error-fixes-implementation.md` - This documentation
|
|
|
|
**Modified Files**:
|
|
- `server/api/auth/keycloak/callback.ts` - Uses simplified keycloak client
|
|
- `server/api/auth/refresh.ts` - Enhanced with retry logic
|
|
- `server/utils/auth.ts` - Added internal authentication support
|
|
- `pages/login.vue` - Better error message handling
|
|
- `plugins/00.startup-check.server.ts` - Enhanced startup checks
|
|
- `server/api/health.ts` - Added circuit breaker monitoring
|
|
|
|
## Testing the Fixes
|
|
|
|
### 1. Test Keycloak Connectivity
|
|
```bash
|
|
curl https://client.portnimara.dev/api/debug/test-keycloak-connectivity
|
|
```
|
|
|
|
### 2. Test Background Task Authentication
|
|
The `process-sales-emails` task should now work without 401 errors due to the `x-tag: 094ut234` header being recognized as internal authentication.
|
|
|
|
### 3. Test User Authentication Flow
|
|
Normal login should work without 502 errors, with better retry logic handling temporary network issues.
|
|
|
|
## Summary
|
|
|
|
These changes provide a robust, resilient authentication system that can handle:
|
|
- Temporary network issues
|
|
- Service degradation
|
|
- High load scenarios
|
|
- Background task authentication
|
|
- Better monitoring and debugging
|
|
|
|
The 502 errors during login should now be completely eliminated with proper fallback mechanisms and user feedback. Background tasks now have proper authentication bypassing user session requirements.
|