311 lines
8.5 KiB
Markdown
311 lines
8.5 KiB
Markdown
|
|
# Authentication Session Timeout Fix - Deployment Guide
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
This document provides step-by-step instructions for deploying the authentication session timeout fixes that resolve the 2-minute logout issue.
|
||
|
|
|
||
|
|
## Problem Summary
|
||
|
|
|
||
|
|
Users were experiencing unexpected logouts after exactly 2 minutes when navigating between pages. This was caused by:
|
||
|
|
|
||
|
|
1. **Timing Race Condition**: Authentication middleware cache expiry (2 minutes) and auth refresh plugin periodic validation (2 minutes) occurring simultaneously
|
||
|
|
2. **No Request Deduplication**: Multiple concurrent session checks causing conflicts
|
||
|
|
3. **Insufficient Error Handling**: Network errors triggering immediate logouts
|
||
|
|
4. **No Grace Periods**: Transient issues causing permanent session loss
|
||
|
|
|
||
|
|
## Solution Overview
|
||
|
|
|
||
|
|
### Core Changes
|
||
|
|
|
||
|
|
1. **Session Manager Utility** (`server/utils/session-manager.ts`)
|
||
|
|
- Centralized session management with request deduplication
|
||
|
|
- Promise caching for in-flight requests
|
||
|
|
- Network error grace periods
|
||
|
|
- Comprehensive logging and statistics
|
||
|
|
|
||
|
|
2. **Authentication Middleware** (`middleware/authentication.ts`)
|
||
|
|
- Changed cache expiry from 2 to 3 minutes with jitter
|
||
|
|
- Integrated SessionManager for deduplication
|
||
|
|
- Enhanced error handling and user feedback
|
||
|
|
|
||
|
|
3. **Auth Refresh Plugin** (`plugins/01.auth-refresh.client.ts`)
|
||
|
|
- Added random offset to prevent simultaneous validation
|
||
|
|
- Improved concurrent validation prevention
|
||
|
|
- Better error handling for network issues
|
||
|
|
|
||
|
|
4. **Session API** (`server/api/auth/session.ts`)
|
||
|
|
- Enhanced logging with request IDs
|
||
|
|
- Detailed error categorization
|
||
|
|
- Performance timing measurements
|
||
|
|
|
||
|
|
5. **Keycloak Client** (`server/utils/keycloak-client.ts`)
|
||
|
|
- Better error type distinction
|
||
|
|
- Increased retry attempts for token refresh
|
||
|
|
- Improved timeout handling
|
||
|
|
|
||
|
|
6. **Refresh API** (`server/api/auth/refresh.ts`)
|
||
|
|
- Enhanced error handling with request IDs
|
||
|
|
- Grace period support for transient failures
|
||
|
|
- Selective session clearing based on error type
|
||
|
|
|
||
|
|
## Pre-deployment Checklist
|
||
|
|
|
||
|
|
- [ ] **Code Review**: All changes reviewed and approved
|
||
|
|
- [ ] **Environment Variables**: Verify all required environment variables are set
|
||
|
|
- [ ] **Dependencies**: Confirm no new dependencies are required
|
||
|
|
- [ ] **Backup**: Create backup of current production code
|
||
|
|
- [ ] **Monitoring**: Ensure authentication logs are being captured
|
||
|
|
- [ ] **Testing**: Verify fixes work in staging environment (if available)
|
||
|
|
|
||
|
|
## Deployment Steps
|
||
|
|
|
||
|
|
### Step 1: Deploy Session Manager Utility
|
||
|
|
|
||
|
|
1. Deploy `server/utils/session-manager.ts`
|
||
|
|
2. Verify no TypeScript compilation errors
|
||
|
|
3. Check server logs for any startup issues
|
||
|
|
|
||
|
|
### Step 2: Update Authentication Middleware
|
||
|
|
|
||
|
|
1. Deploy updated `middleware/authentication.ts`
|
||
|
|
2. Monitor for any middleware errors in logs
|
||
|
|
3. Verify new timing configuration is active
|
||
|
|
|
||
|
|
### Step 3: Update Auth Refresh Plugin
|
||
|
|
|
||
|
|
1. Deploy updated `plugins/01.auth-refresh.client.ts`
|
||
|
|
2. Check browser console for any client-side errors
|
||
|
|
3. Verify random offset is working (check logs)
|
||
|
|
|
||
|
|
### Step 4: Update Session API
|
||
|
|
|
||
|
|
1. Deploy updated `server/api/auth/session.ts`
|
||
|
|
2. Monitor API endpoint logs for request IDs
|
||
|
|
3. Verify enhanced error messages are working
|
||
|
|
|
||
|
|
### Step 5: Update Keycloak Client
|
||
|
|
|
||
|
|
1. Deploy updated `server/utils/keycloak-client.ts`
|
||
|
|
2. Check for any Keycloak communication errors
|
||
|
|
3. Verify retry logic is functioning
|
||
|
|
|
||
|
|
### Step 6: Update Refresh API
|
||
|
|
|
||
|
|
1. Deploy updated `server/api/auth/refresh.ts`
|
||
|
|
2. Monitor token refresh operations
|
||
|
|
3. Verify graceful error handling
|
||
|
|
|
||
|
|
## Post-deployment Verification
|
||
|
|
|
||
|
|
### Immediate Verification (0-5 minutes)
|
||
|
|
|
||
|
|
1. **No Deployment Errors**
|
||
|
|
```bash
|
||
|
|
# Check server logs
|
||
|
|
tail -f /var/log/application.log | grep -E "(ERROR|FATAL)"
|
||
|
|
|
||
|
|
# Check for any 500 errors
|
||
|
|
curl -I https://your-domain.com/api/health
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Login Flow Test**
|
||
|
|
- Navigate to login page
|
||
|
|
- Complete authentication
|
||
|
|
- Verify successful redirect to dashboard
|
||
|
|
|
||
|
|
3. **Session API Test**
|
||
|
|
```bash
|
||
|
|
# Test session endpoint
|
||
|
|
curl -X GET https://your-domain.com/api/auth/session \
|
||
|
|
-H "Cookie: nuxt-oidc-auth=<session-cookie>"
|
||
|
|
```
|
||
|
|
|
||
|
|
### Short-term Verification (5-15 minutes)
|
||
|
|
|
||
|
|
1. **Navigation Test**
|
||
|
|
- Stay logged in for 5+ minutes
|
||
|
|
- Navigate between different pages
|
||
|
|
- Verify no unexpected logouts
|
||
|
|
|
||
|
|
2. **Log Analysis**
|
||
|
|
```bash
|
||
|
|
# Check for new session manager logs
|
||
|
|
grep "SESSION_MANAGER" /var/log/application.log
|
||
|
|
|
||
|
|
# Verify timing desynchronization
|
||
|
|
grep "Using cached session" /var/log/application.log
|
||
|
|
```
|
||
|
|
|
||
|
|
### Long-term Verification (15+ minutes)
|
||
|
|
|
||
|
|
1. **2-Minute Boundary Test**
|
||
|
|
- Stay logged in for exactly 2 minutes
|
||
|
|
- Navigate to a new page
|
||
|
|
- Verify user remains authenticated
|
||
|
|
|
||
|
|
2. **3-Minute Cache Test**
|
||
|
|
- Stay on same page for 3+ minutes
|
||
|
|
- Navigate to new page
|
||
|
|
- Verify session is refreshed, not lost
|
||
|
|
|
||
|
|
3. **Network Error Simulation**
|
||
|
|
- Temporarily block network access
|
||
|
|
- Verify graceful degradation
|
||
|
|
- Restore network and verify recovery
|
||
|
|
|
||
|
|
## Monitoring and Alerts
|
||
|
|
|
||
|
|
### Key Metrics to Monitor
|
||
|
|
|
||
|
|
1. **Authentication Errors**
|
||
|
|
```bash
|
||
|
|
# Monitor auth failure rate
|
||
|
|
grep -c "AUTH_ERROR" /var/log/application.log
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Session Manager Performance**
|
||
|
|
```bash
|
||
|
|
# Check session check durations
|
||
|
|
grep "Session check completed" /var/log/application.log
|
||
|
|
```
|
||
|
|
|
||
|
|
3. **Cache Hit Rate**
|
||
|
|
```bash
|
||
|
|
# Monitor cache effectiveness
|
||
|
|
grep "Using cached session" /var/log/application.log | wc -l
|
||
|
|
```
|
||
|
|
|
||
|
|
### Alert Thresholds
|
||
|
|
|
||
|
|
- **Auth Error Rate**: > 5% of total auth checks
|
||
|
|
- **Session Check Duration**: > 2 seconds average
|
||
|
|
- **Cache Miss Rate**: > 80% (indicates caching issues)
|
||
|
|
|
||
|
|
## Rollback Procedures
|
||
|
|
|
||
|
|
### Immediate Rollback (if critical issues)
|
||
|
|
|
||
|
|
1. **Stop Application**
|
||
|
|
```bash
|
||
|
|
systemctl stop your-application
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Restore Previous Code**
|
||
|
|
```bash
|
||
|
|
git checkout previous-stable-tag
|
||
|
|
npm install
|
||
|
|
npm run build
|
||
|
|
```
|
||
|
|
|
||
|
|
3. **Restart Application**
|
||
|
|
```bash
|
||
|
|
systemctl start your-application
|
||
|
|
```
|
||
|
|
|
||
|
|
4. **Verify Rollback**
|
||
|
|
- Test login functionality
|
||
|
|
- Check error logs
|
||
|
|
- Verify user sessions work
|
||
|
|
|
||
|
|
### Partial Rollback (if specific component issues)
|
||
|
|
|
||
|
|
1. **Identify Problem Component**
|
||
|
|
- Check which specific file is causing issues
|
||
|
|
- Review recent error logs
|
||
|
|
|
||
|
|
2. **Rollback Specific Files**
|
||
|
|
```bash
|
||
|
|
git checkout HEAD~1 -- middleware/authentication.ts
|
||
|
|
# or
|
||
|
|
git checkout HEAD~1 -- server/utils/session-manager.ts
|
||
|
|
```
|
||
|
|
|
||
|
|
3. **Rebuild and Test**
|
||
|
|
```bash
|
||
|
|
npm run build
|
||
|
|
systemctl restart your-application
|
||
|
|
```
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
### Common Issues
|
||
|
|
|
||
|
|
1. **Users Still Getting Logged Out at 2 Minutes**
|
||
|
|
- Check if SessionManager is being used
|
||
|
|
- Verify cache expiry changes are active
|
||
|
|
- Look for timing synchronization issues
|
||
|
|
|
||
|
|
2. **Session Check Errors**
|
||
|
|
- Check network connectivity to Keycloak
|
||
|
|
- Verify environment variables are set
|
||
|
|
- Check Keycloak circuit breaker status
|
||
|
|
|
||
|
|
3. **Performance Issues**
|
||
|
|
- Monitor session check durations
|
||
|
|
- Check cache hit rates
|
||
|
|
- Verify request deduplication is working
|
||
|
|
|
||
|
|
### Debug Commands
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Check session manager cache stats
|
||
|
|
curl https://your-domain.com/api/debug/session-cache-stats
|
||
|
|
|
||
|
|
# Monitor real-time auth logs
|
||
|
|
tail -f /var/log/application.log | grep -E "(SESSION|AUTH_REFRESH|MIDDLEWARE)"
|
||
|
|
|
||
|
|
# Check Keycloak connectivity
|
||
|
|
curl https://your-domain.com/api/debug/test-keycloak-connectivity
|
||
|
|
```
|
||
|
|
|
||
|
|
## Success Criteria
|
||
|
|
|
||
|
|
The deployment is considered successful when:
|
||
|
|
|
||
|
|
1. **No 2-Minute Logouts**: Users can navigate freely after 2 minutes
|
||
|
|
2. **Improved Error Handling**: Network issues don't cause immediate logouts
|
||
|
|
3. **Better Performance**: Session checks complete faster due to caching
|
||
|
|
4. **Enhanced Logging**: Detailed logs help with debugging future issues
|
||
|
|
5. **Graceful Degradation**: System handles transient failures elegantly
|
||
|
|
|
||
|
|
## Contact Information
|
||
|
|
|
||
|
|
For issues or questions regarding this deployment:
|
||
|
|
|
||
|
|
- **Technical Lead**: [Your Name]
|
||
|
|
- **Emergency Contact**: [Emergency Number]
|
||
|
|
- **Documentation**: This file and related docs in `/docs/` directory
|
||
|
|
|
||
|
|
## Appendix
|
||
|
|
|
||
|
|
### Environment Variables Required
|
||
|
|
|
||
|
|
```env
|
||
|
|
KEYCLOAK_CLIENT_SECRET=your-secret-key
|
||
|
|
COOKIE_DOMAIN=.portnimara.dev
|
||
|
|
```
|
||
|
|
|
||
|
|
### Log Examples
|
||
|
|
|
||
|
|
Successful session check:
|
||
|
|
```
|
||
|
|
[SESSION_MANAGER:abc123] Session check completed: {"authenticated":true,"reason":null,"fromCache":false}
|
||
|
|
```
|
||
|
|
|
||
|
|
Cache hit:
|
||
|
|
```
|
||
|
|
[SESSION_MANAGER:def456] Using cached session (age: 45 seconds)
|
||
|
|
```
|
||
|
|
|
||
|
|
Network error with grace period:
|
||
|
|
```
|
||
|
|
[SESSION_MANAGER:ghi789] Using cached result due to network error
|
||
|
|
```
|
||
|
|
|
||
|
|
### Performance Benchmarks
|
||
|
|
|
||
|
|
- **Session Check Duration**: < 500ms average
|
||
|
|
- **Cache Hit Rate**: > 70%
|
||
|
|
- **Authentication Success Rate**: > 99%
|
||
|
|
- **Network Error Recovery**: < 5 seconds
|