port-nimara-client-portal/docs/502-error-fixes-deployment.md

213 lines
5.8 KiB
Markdown

# 502 Error Fixes - Deployment Guide
This guide covers the comprehensive fixes implemented to resolve the 502 errors in the Port Nimara Client Portal.
## Overview of Changes
### 1. Health Check System
- **New Files:**
- `server/api/health.ts` - Health check endpoint
- `server/utils/health-checker.ts` - Application readiness management
- `server/utils/service-checks.ts` - Service availability checks
- `server/plugins/startup-health.ts` - Startup health monitoring
### 2. Process Management (PM2)
- **New Files:**
- `ecosystem.config.js` - PM2 configuration with cluster mode
- `docker-entrypoint.sh` - Custom entrypoint with service checks
- **Modified Files:**
- `Dockerfile` - Added PM2, health checks, and tini init process
### 3. Nginx Configuration
- **New Files:**
- `nginx/upstream.conf` - Upstream configuration with health checks
- `nginx/client.portnimara.dev.conf` - Enhanced Nginx configuration
- `nginx/error-pages/error-502.html` - Custom 502 error page with auto-retry
### 4. External Service Resilience
- **New Files:**
- `server/utils/resilient-http.ts` - HTTP client with retry logic and circuit breaker
- `server/utils/documeso.ts` - Enhanced Documenso client
- **Modified Files:**
- `server/utils/nocodb.ts` - Updated to use resilient HTTP client
### 5. Memory & Performance Optimization
- **New Files:**
- `server/plugins/memory-monitor.ts` - Memory usage monitoring
- `server/utils/cleanup-manager.ts` - Resource cleanup utilities
- `server/utils/request-queue.ts` - Request queue system
## Deployment Steps
### Step 1: Build Docker Image
```bash
# Build the new Docker image with PM2 support
docker build -t port-nimara-client-portal:resilient .
```
### Step 2: Update Nginx Configuration
1. Copy the new Nginx configuration files to your server:
```bash
# Copy upstream configuration
sudo cp nginx/upstream.conf /etc/nginx/conf.d/
# Copy site configuration
sudo cp nginx/client.portnimara.dev.conf /etc/nginx/sites-available/client.portnimara.dev
# Create error pages directory
sudo mkdir -p /etc/nginx/error-pages
sudo cp nginx/error-pages/error-502.html /etc/nginx/error-pages/
# Test Nginx configuration
sudo nginx -t
# Reload Nginx
sudo nginx -s reload
```
### Step 3: Update Environment Variables
Ensure all required environment variables are set:
```bash
# Required for health checks
NUXT_NOCODB_URL=your-nocodb-url
NUXT_NOCODB_TOKEN=your-nocodb-token
NUXT_MINIO_ACCESS_KEY=your-minio-key
NUXT_MINIO_SECRET_KEY=your-minio-secret
NUXT_DOCUMENSO_API_KEY=your-documenso-key
NUXT_DOCUMENSO_BASE_URL=https://signatures.portnimara.dev
# Optional - for enabling garbage collection monitoring
NODE_OPTIONS="--max-old-space-size=8192 --expose-gc"
```
### Step 4: Deploy with Docker Compose
Create or update your `docker-compose.yml`:
```yaml
version: '3.8'
services:
client-portal:
image: port-nimara-client-portal:resilient
ports:
- "3028:3000"
environment:
- NODE_ENV=production
- NUXT_NOCODB_URL=${NUXT_NOCODB_URL}
- NUXT_NOCODB_TOKEN=${NUXT_NOCODB_TOKEN}
- NUXT_MINIO_ACCESS_KEY=${NUXT_MINIO_ACCESS_KEY}
- NUXT_MINIO_SECRET_KEY=${NUXT_MINIO_SECRET_KEY}
- NUXT_DOCUMENSO_API_KEY=${NUXT_DOCUMENSO_API_KEY}
- NUXT_DOCUMENSO_BASE_URL=${NUXT_DOCUMENSO_BASE_URL}
volumes:
- ./logs:/app/logs
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/api/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
restart: unless-stopped
networks:
- port-nimara-network
networks:
port-nimara-network:
external: true
```
Deploy:
```bash
docker-compose up -d
```
### Step 5: Verify Deployment
1. Check health endpoint:
```bash
curl https://client.portnimara.dev/api/health
```
2. Monitor PM2 processes:
```bash
docker exec -it <container-id> pm2 list
docker exec -it <container-id> pm2 monit
```
3. Check logs:
```bash
# PM2 logs
docker exec -it <container-id> pm2 logs
# Container logs
docker logs -f <container-id>
```
## Monitoring & Maintenance
### Health Check Monitoring
The health endpoint provides comprehensive information:
- Service availability (NocoDB, Directus, MinIO, Documenso)
- Memory usage
- Response times
- Circuit breaker status
### PM2 Commands
```bash
# Inside the container
pm2 list # List all processes
pm2 monit # Real-time monitoring
pm2 reload all # Zero-downtime reload
pm2 logs # View logs
pm2 flush # Clear logs
```
### Troubleshooting
1. **If 502 errors persist:**
- Check `/api/health` to identify failing services
- Review PM2 logs for crash information
- Verify all environment variables are set
- Check Nginx error logs: `sudo tail -f /var/log/nginx/error.log`
2. **High memory usage:**
- Monitor memory via health endpoint
- PM2 will auto-restart at 7GB limit
- Check for memory leaks in logs
3. **Slow responses:**
- Check circuit breaker status in health endpoint
- Review queue status for backlogs
- Monitor external service response times
## Benefits of These Changes
1. **Automatic Recovery:** PM2 cluster mode ensures the app restarts on crashes
2. **Graceful Degradation:** Circuit breakers prevent cascading failures
3. **Better Error Handling:** Retry logic handles temporary failures
4. **Improved Visibility:** Health checks and monitoring provide insights
5. **User Experience:** Custom 502 page with auto-retry improves UX
6. **Resource Management:** Memory monitoring and cleanup prevent leaks
## Rollback Plan
If issues arise:
1. Keep the previous Docker image tagged
2. Revert Nginx configuration:
```bash
sudo cp /etc/nginx/sites-available/client.portnimara.dev.backup /etc/nginx/sites-available/client.portnimara.dev
sudo nginx -s reload
```
3. Deploy previous Docker image:
```bash
docker-compose down
docker-compose up -d