Refactor authentication to use centralized session manager

Extract session management logic from middleware into reusable SessionManager utility to improve reliability, reduce code duplication, and prevent thundering herd issues with jittered cache expiry.
This commit is contained in:
2025-07-11 14:43:50 -04:00
parent bf2361050f
commit c6f81a6686
8 changed files with 1051 additions and 139 deletions

View File

@@ -0,0 +1,310 @@
# Authentication Session Timeout Fix - Deployment Guide
## Overview
This document provides step-by-step instructions for deploying the authentication session timeout fixes that resolve the 2-minute logout issue.
## Problem Summary
Users were experiencing unexpected logouts after exactly 2 minutes when navigating between pages. This was caused by:
1. **Timing Race Condition**: Authentication middleware cache expiry (2 minutes) and auth refresh plugin periodic validation (2 minutes) occurring simultaneously
2. **No Request Deduplication**: Multiple concurrent session checks causing conflicts
3. **Insufficient Error Handling**: Network errors triggering immediate logouts
4. **No Grace Periods**: Transient issues causing permanent session loss
## Solution Overview
### Core Changes
1. **Session Manager Utility** (`server/utils/session-manager.ts`)
- Centralized session management with request deduplication
- Promise caching for in-flight requests
- Network error grace periods
- Comprehensive logging and statistics
2. **Authentication Middleware** (`middleware/authentication.ts`)
- Changed cache expiry from 2 to 3 minutes with jitter
- Integrated SessionManager for deduplication
- Enhanced error handling and user feedback
3. **Auth Refresh Plugin** (`plugins/01.auth-refresh.client.ts`)
- Added random offset to prevent simultaneous validation
- Improved concurrent validation prevention
- Better error handling for network issues
4. **Session API** (`server/api/auth/session.ts`)
- Enhanced logging with request IDs
- Detailed error categorization
- Performance timing measurements
5. **Keycloak Client** (`server/utils/keycloak-client.ts`)
- Better error type distinction
- Increased retry attempts for token refresh
- Improved timeout handling
6. **Refresh API** (`server/api/auth/refresh.ts`)
- Enhanced error handling with request IDs
- Grace period support for transient failures
- Selective session clearing based on error type
## Pre-deployment Checklist
- [ ] **Code Review**: All changes reviewed and approved
- [ ] **Environment Variables**: Verify all required environment variables are set
- [ ] **Dependencies**: Confirm no new dependencies are required
- [ ] **Backup**: Create backup of current production code
- [ ] **Monitoring**: Ensure authentication logs are being captured
- [ ] **Testing**: Verify fixes work in staging environment (if available)
## Deployment Steps
### Step 1: Deploy Session Manager Utility
1. Deploy `server/utils/session-manager.ts`
2. Verify no TypeScript compilation errors
3. Check server logs for any startup issues
### Step 2: Update Authentication Middleware
1. Deploy updated `middleware/authentication.ts`
2. Monitor for any middleware errors in logs
3. Verify new timing configuration is active
### Step 3: Update Auth Refresh Plugin
1. Deploy updated `plugins/01.auth-refresh.client.ts`
2. Check browser console for any client-side errors
3. Verify random offset is working (check logs)
### Step 4: Update Session API
1. Deploy updated `server/api/auth/session.ts`
2. Monitor API endpoint logs for request IDs
3. Verify enhanced error messages are working
### Step 5: Update Keycloak Client
1. Deploy updated `server/utils/keycloak-client.ts`
2. Check for any Keycloak communication errors
3. Verify retry logic is functioning
### Step 6: Update Refresh API
1. Deploy updated `server/api/auth/refresh.ts`
2. Monitor token refresh operations
3. Verify graceful error handling
## Post-deployment Verification
### Immediate Verification (0-5 minutes)
1. **No Deployment Errors**
```bash
# Check server logs
tail -f /var/log/application.log | grep -E "(ERROR|FATAL)"
# Check for any 500 errors
curl -I https://your-domain.com/api/health
```
2. **Login Flow Test**
- Navigate to login page
- Complete authentication
- Verify successful redirect to dashboard
3. **Session API Test**
```bash
# Test session endpoint
curl -X GET https://your-domain.com/api/auth/session \
-H "Cookie: nuxt-oidc-auth=<session-cookie>"
```
### Short-term Verification (5-15 minutes)
1. **Navigation Test**
- Stay logged in for 5+ minutes
- Navigate between different pages
- Verify no unexpected logouts
2. **Log Analysis**
```bash
# Check for new session manager logs
grep "SESSION_MANAGER" /var/log/application.log
# Verify timing desynchronization
grep "Using cached session" /var/log/application.log
```
### Long-term Verification (15+ minutes)
1. **2-Minute Boundary Test**
- Stay logged in for exactly 2 minutes
- Navigate to a new page
- Verify user remains authenticated
2. **3-Minute Cache Test**
- Stay on same page for 3+ minutes
- Navigate to new page
- Verify session is refreshed, not lost
3. **Network Error Simulation**
- Temporarily block network access
- Verify graceful degradation
- Restore network and verify recovery
## Monitoring and Alerts
### Key Metrics to Monitor
1. **Authentication Errors**
```bash
# Monitor auth failure rate
grep -c "AUTH_ERROR" /var/log/application.log
```
2. **Session Manager Performance**
```bash
# Check session check durations
grep "Session check completed" /var/log/application.log
```
3. **Cache Hit Rate**
```bash
# Monitor cache effectiveness
grep "Using cached session" /var/log/application.log | wc -l
```
### Alert Thresholds
- **Auth Error Rate**: > 5% of total auth checks
- **Session Check Duration**: > 2 seconds average
- **Cache Miss Rate**: > 80% (indicates caching issues)
## Rollback Procedures
### Immediate Rollback (if critical issues)
1. **Stop Application**
```bash
systemctl stop your-application
```
2. **Restore Previous Code**
```bash
git checkout previous-stable-tag
npm install
npm run build
```
3. **Restart Application**
```bash
systemctl start your-application
```
4. **Verify Rollback**
- Test login functionality
- Check error logs
- Verify user sessions work
### Partial Rollback (if specific component issues)
1. **Identify Problem Component**
- Check which specific file is causing issues
- Review recent error logs
2. **Rollback Specific Files**
```bash
git checkout HEAD~1 -- middleware/authentication.ts
# or
git checkout HEAD~1 -- server/utils/session-manager.ts
```
3. **Rebuild and Test**
```bash
npm run build
systemctl restart your-application
```
## Troubleshooting
### Common Issues
1. **Users Still Getting Logged Out at 2 Minutes**
- Check if SessionManager is being used
- Verify cache expiry changes are active
- Look for timing synchronization issues
2. **Session Check Errors**
- Check network connectivity to Keycloak
- Verify environment variables are set
- Check Keycloak circuit breaker status
3. **Performance Issues**
- Monitor session check durations
- Check cache hit rates
- Verify request deduplication is working
### Debug Commands
```bash
# Check session manager cache stats
curl https://your-domain.com/api/debug/session-cache-stats
# Monitor real-time auth logs
tail -f /var/log/application.log | grep -E "(SESSION|AUTH_REFRESH|MIDDLEWARE)"
# Check Keycloak connectivity
curl https://your-domain.com/api/debug/test-keycloak-connectivity
```
## Success Criteria
The deployment is considered successful when:
1. **No 2-Minute Logouts**: Users can navigate freely after 2 minutes
2. **Improved Error Handling**: Network issues don't cause immediate logouts
3. **Better Performance**: Session checks complete faster due to caching
4. **Enhanced Logging**: Detailed logs help with debugging future issues
5. **Graceful Degradation**: System handles transient failures elegantly
## Contact Information
For issues or questions regarding this deployment:
- **Technical Lead**: [Your Name]
- **Emergency Contact**: [Emergency Number]
- **Documentation**: This file and related docs in `/docs/` directory
## Appendix
### Environment Variables Required
```env
KEYCLOAK_CLIENT_SECRET=your-secret-key
COOKIE_DOMAIN=.portnimara.dev
```
### Log Examples
Successful session check:
```
[SESSION_MANAGER:abc123] Session check completed: {"authenticated":true,"reason":null,"fromCache":false}
```
Cache hit:
```
[SESSION_MANAGER:def456] Using cached session (age: 45 seconds)
```
Network error with grace period:
```
[SESSION_MANAGER:ghi789] Using cached result due to network error
```
### Performance Benchmarks
- **Session Check Duration**: < 500ms average
- **Cache Hit Rate**: > 70%
- **Authentication Success Rate**: > 99%
- **Network Error Recovery**: < 5 seconds