# Authentication Session Timeout Fix - Deployment Guide ## Overview This document provides step-by-step instructions for deploying the authentication session timeout fixes that resolve the 2-minute logout issue. ## Problem Summary Users were experiencing unexpected logouts after exactly 2 minutes when navigating between pages. This was caused by: 1. **Timing Race Condition**: Authentication middleware cache expiry (2 minutes) and auth refresh plugin periodic validation (2 minutes) occurring simultaneously 2. **No Request Deduplication**: Multiple concurrent session checks causing conflicts 3. **Insufficient Error Handling**: Network errors triggering immediate logouts 4. **No Grace Periods**: Transient issues causing permanent session loss ## Solution Overview ### Core Changes 1. **Session Manager Utility** (`server/utils/session-manager.ts`) - Centralized session management with request deduplication - Promise caching for in-flight requests - Network error grace periods - Comprehensive logging and statistics 2. **Authentication Middleware** (`middleware/authentication.ts`) - Changed cache expiry from 2 to 3 minutes with jitter - Integrated SessionManager for deduplication - Enhanced error handling and user feedback 3. **Auth Refresh Plugin** (`plugins/01.auth-refresh.client.ts`) - Added random offset to prevent simultaneous validation - Improved concurrent validation prevention - Better error handling for network issues 4. **Session API** (`server/api/auth/session.ts`) - Enhanced logging with request IDs - Detailed error categorization - Performance timing measurements 5. **Keycloak Client** (`server/utils/keycloak-client.ts`) - Better error type distinction - Increased retry attempts for token refresh - Improved timeout handling 6. **Refresh API** (`server/api/auth/refresh.ts`) - Enhanced error handling with request IDs - Grace period support for transient failures - Selective session clearing based on error type ## Pre-deployment Checklist - [ ] **Code Review**: All changes reviewed and approved - [ ] **Environment Variables**: Verify all required environment variables are set - [ ] **Dependencies**: Confirm no new dependencies are required - [ ] **Backup**: Create backup of current production code - [ ] **Monitoring**: Ensure authentication logs are being captured - [ ] **Testing**: Verify fixes work in staging environment (if available) ## Deployment Steps ### Step 1: Deploy Session Manager Utility 1. Deploy `server/utils/session-manager.ts` 2. Verify no TypeScript compilation errors 3. Check server logs for any startup issues ### Step 2: Update Authentication Middleware 1. Deploy updated `middleware/authentication.ts` 2. Monitor for any middleware errors in logs 3. Verify new timing configuration is active ### Step 3: Update Auth Refresh Plugin 1. Deploy updated `plugins/01.auth-refresh.client.ts` 2. Check browser console for any client-side errors 3. Verify random offset is working (check logs) ### Step 4: Update Session API 1. Deploy updated `server/api/auth/session.ts` 2. Monitor API endpoint logs for request IDs 3. Verify enhanced error messages are working ### Step 5: Update Keycloak Client 1. Deploy updated `server/utils/keycloak-client.ts` 2. Check for any Keycloak communication errors 3. Verify retry logic is functioning ### Step 6: Update Refresh API 1. Deploy updated `server/api/auth/refresh.ts` 2. Monitor token refresh operations 3. Verify graceful error handling ## Post-deployment Verification ### Immediate Verification (0-5 minutes) 1. **No Deployment Errors** ```bash # Check server logs tail -f /var/log/application.log | grep -E "(ERROR|FATAL)" # Check for any 500 errors curl -I https://your-domain.com/api/health ``` 2. **Login Flow Test** - Navigate to login page - Complete authentication - Verify successful redirect to dashboard 3. **Session API Test** ```bash # Test session endpoint curl -X GET https://your-domain.com/api/auth/session \ -H "Cookie: nuxt-oidc-auth=" ``` ### Short-term Verification (5-15 minutes) 1. **Navigation Test** - Stay logged in for 5+ minutes - Navigate between different pages - Verify no unexpected logouts 2. **Log Analysis** ```bash # Check for new session manager logs grep "SESSION_MANAGER" /var/log/application.log # Verify timing desynchronization grep "Using cached session" /var/log/application.log ``` ### Long-term Verification (15+ minutes) 1. **2-Minute Boundary Test** - Stay logged in for exactly 2 minutes - Navigate to a new page - Verify user remains authenticated 2. **3-Minute Cache Test** - Stay on same page for 3+ minutes - Navigate to new page - Verify session is refreshed, not lost 3. **Network Error Simulation** - Temporarily block network access - Verify graceful degradation - Restore network and verify recovery ## Monitoring and Alerts ### Key Metrics to Monitor 1. **Authentication Errors** ```bash # Monitor auth failure rate grep -c "AUTH_ERROR" /var/log/application.log ``` 2. **Session Manager Performance** ```bash # Check session check durations grep "Session check completed" /var/log/application.log ``` 3. **Cache Hit Rate** ```bash # Monitor cache effectiveness grep "Using cached session" /var/log/application.log | wc -l ``` ### Alert Thresholds - **Auth Error Rate**: > 5% of total auth checks - **Session Check Duration**: > 2 seconds average - **Cache Miss Rate**: > 80% (indicates caching issues) ## Rollback Procedures ### Immediate Rollback (if critical issues) 1. **Stop Application** ```bash systemctl stop your-application ``` 2. **Restore Previous Code** ```bash git checkout previous-stable-tag npm install npm run build ``` 3. **Restart Application** ```bash systemctl start your-application ``` 4. **Verify Rollback** - Test login functionality - Check error logs - Verify user sessions work ### Partial Rollback (if specific component issues) 1. **Identify Problem Component** - Check which specific file is causing issues - Review recent error logs 2. **Rollback Specific Files** ```bash git checkout HEAD~1 -- middleware/authentication.ts # or git checkout HEAD~1 -- server/utils/session-manager.ts ``` 3. **Rebuild and Test** ```bash npm run build systemctl restart your-application ``` ## Troubleshooting ### Common Issues 1. **Users Still Getting Logged Out at 2 Minutes** - Check if SessionManager is being used - Verify cache expiry changes are active - Look for timing synchronization issues 2. **Session Check Errors** - Check network connectivity to Keycloak - Verify environment variables are set - Check Keycloak circuit breaker status 3. **Performance Issues** - Monitor session check durations - Check cache hit rates - Verify request deduplication is working ### Debug Commands ```bash # Check session manager cache stats curl https://your-domain.com/api/debug/session-cache-stats # Monitor real-time auth logs tail -f /var/log/application.log | grep -E "(SESSION|AUTH_REFRESH|MIDDLEWARE)" # Check Keycloak connectivity curl https://your-domain.com/api/debug/test-keycloak-connectivity ``` ## Success Criteria The deployment is considered successful when: 1. **No 2-Minute Logouts**: Users can navigate freely after 2 minutes 2. **Improved Error Handling**: Network issues don't cause immediate logouts 3. **Better Performance**: Session checks complete faster due to caching 4. **Enhanced Logging**: Detailed logs help with debugging future issues 5. **Graceful Degradation**: System handles transient failures elegantly ## Contact Information For issues or questions regarding this deployment: - **Technical Lead**: [Your Name] - **Emergency Contact**: [Emergency Number] - **Documentation**: This file and related docs in `/docs/` directory ## Appendix ### Environment Variables Required ```env KEYCLOAK_CLIENT_SECRET=your-secret-key COOKIE_DOMAIN=.portnimara.dev ``` ### Log Examples Successful session check: ``` [SESSION_MANAGER:abc123] Session check completed: {"authenticated":true,"reason":null,"fromCache":false} ``` Cache hit: ``` [SESSION_MANAGER:def456] Using cached session (age: 45 seconds) ``` Network error with grace period: ``` [SESSION_MANAGER:ghi789] Using cached result due to network error ``` ### Performance Benchmarks - **Session Check Duration**: < 500ms average - **Cache Hit Rate**: > 70% - **Authentication Success Rate**: > 99% - **Network Error Recovery**: < 5 seconds