port-nimara-client-portal/docs/authentication-session-time...

311 lines
8.5 KiB
Markdown
Raw Normal View History

# Authentication Session Timeout Fix - Deployment Guide
## Overview
This document provides step-by-step instructions for deploying the authentication session timeout fixes that resolve the 2-minute logout issue.
## Problem Summary
Users were experiencing unexpected logouts after exactly 2 minutes when navigating between pages. This was caused by:
1. **Timing Race Condition**: Authentication middleware cache expiry (2 minutes) and auth refresh plugin periodic validation (2 minutes) occurring simultaneously
2. **No Request Deduplication**: Multiple concurrent session checks causing conflicts
3. **Insufficient Error Handling**: Network errors triggering immediate logouts
4. **No Grace Periods**: Transient issues causing permanent session loss
## Solution Overview
### Core Changes
1. **Session Manager Utility** (`server/utils/session-manager.ts`)
- Centralized session management with request deduplication
- Promise caching for in-flight requests
- Network error grace periods
- Comprehensive logging and statistics
2. **Authentication Middleware** (`middleware/authentication.ts`)
- Changed cache expiry from 2 to 3 minutes with jitter
- Integrated SessionManager for deduplication
- Enhanced error handling and user feedback
3. **Auth Refresh Plugin** (`plugins/01.auth-refresh.client.ts`)
- Added random offset to prevent simultaneous validation
- Improved concurrent validation prevention
- Better error handling for network issues
4. **Session API** (`server/api/auth/session.ts`)
- Enhanced logging with request IDs
- Detailed error categorization
- Performance timing measurements
5. **Keycloak Client** (`server/utils/keycloak-client.ts`)
- Better error type distinction
- Increased retry attempts for token refresh
- Improved timeout handling
6. **Refresh API** (`server/api/auth/refresh.ts`)
- Enhanced error handling with request IDs
- Grace period support for transient failures
- Selective session clearing based on error type
## Pre-deployment Checklist
- [ ] **Code Review**: All changes reviewed and approved
- [ ] **Environment Variables**: Verify all required environment variables are set
- [ ] **Dependencies**: Confirm no new dependencies are required
- [ ] **Backup**: Create backup of current production code
- [ ] **Monitoring**: Ensure authentication logs are being captured
- [ ] **Testing**: Verify fixes work in staging environment (if available)
## Deployment Steps
### Step 1: Deploy Session Manager Utility
1. Deploy `server/utils/session-manager.ts`
2. Verify no TypeScript compilation errors
3. Check server logs for any startup issues
### Step 2: Update Authentication Middleware
1. Deploy updated `middleware/authentication.ts`
2. Monitor for any middleware errors in logs
3. Verify new timing configuration is active
### Step 3: Update Auth Refresh Plugin
1. Deploy updated `plugins/01.auth-refresh.client.ts`
2. Check browser console for any client-side errors
3. Verify random offset is working (check logs)
### Step 4: Update Session API
1. Deploy updated `server/api/auth/session.ts`
2. Monitor API endpoint logs for request IDs
3. Verify enhanced error messages are working
### Step 5: Update Keycloak Client
1. Deploy updated `server/utils/keycloak-client.ts`
2. Check for any Keycloak communication errors
3. Verify retry logic is functioning
### Step 6: Update Refresh API
1. Deploy updated `server/api/auth/refresh.ts`
2. Monitor token refresh operations
3. Verify graceful error handling
## Post-deployment Verification
### Immediate Verification (0-5 minutes)
1. **No Deployment Errors**
```bash
# Check server logs
tail -f /var/log/application.log | grep -E "(ERROR|FATAL)"
# Check for any 500 errors
curl -I https://your-domain.com/api/health
```
2. **Login Flow Test**
- Navigate to login page
- Complete authentication
- Verify successful redirect to dashboard
3. **Session API Test**
```bash
# Test session endpoint
curl -X GET https://your-domain.com/api/auth/session \
-H "Cookie: nuxt-oidc-auth=<session-cookie>"
```
### Short-term Verification (5-15 minutes)
1. **Navigation Test**
- Stay logged in for 5+ minutes
- Navigate between different pages
- Verify no unexpected logouts
2. **Log Analysis**
```bash
# Check for new session manager logs
grep "SESSION_MANAGER" /var/log/application.log
# Verify timing desynchronization
grep "Using cached session" /var/log/application.log
```
### Long-term Verification (15+ minutes)
1. **2-Minute Boundary Test**
- Stay logged in for exactly 2 minutes
- Navigate to a new page
- Verify user remains authenticated
2. **3-Minute Cache Test**
- Stay on same page for 3+ minutes
- Navigate to new page
- Verify session is refreshed, not lost
3. **Network Error Simulation**
- Temporarily block network access
- Verify graceful degradation
- Restore network and verify recovery
## Monitoring and Alerts
### Key Metrics to Monitor
1. **Authentication Errors**
```bash
# Monitor auth failure rate
grep -c "AUTH_ERROR" /var/log/application.log
```
2. **Session Manager Performance**
```bash
# Check session check durations
grep "Session check completed" /var/log/application.log
```
3. **Cache Hit Rate**
```bash
# Monitor cache effectiveness
grep "Using cached session" /var/log/application.log | wc -l
```
### Alert Thresholds
- **Auth Error Rate**: > 5% of total auth checks
- **Session Check Duration**: > 2 seconds average
- **Cache Miss Rate**: > 80% (indicates caching issues)
## Rollback Procedures
### Immediate Rollback (if critical issues)
1. **Stop Application**
```bash
systemctl stop your-application
```
2. **Restore Previous Code**
```bash
git checkout previous-stable-tag
npm install
npm run build
```
3. **Restart Application**
```bash
systemctl start your-application
```
4. **Verify Rollback**
- Test login functionality
- Check error logs
- Verify user sessions work
### Partial Rollback (if specific component issues)
1. **Identify Problem Component**
- Check which specific file is causing issues
- Review recent error logs
2. **Rollback Specific Files**
```bash
git checkout HEAD~1 -- middleware/authentication.ts
# or
git checkout HEAD~1 -- server/utils/session-manager.ts
```
3. **Rebuild and Test**
```bash
npm run build
systemctl restart your-application
```
## Troubleshooting
### Common Issues
1. **Users Still Getting Logged Out at 2 Minutes**
- Check if SessionManager is being used
- Verify cache expiry changes are active
- Look for timing synchronization issues
2. **Session Check Errors**
- Check network connectivity to Keycloak
- Verify environment variables are set
- Check Keycloak circuit breaker status
3. **Performance Issues**
- Monitor session check durations
- Check cache hit rates
- Verify request deduplication is working
### Debug Commands
```bash
# Check session manager cache stats
curl https://your-domain.com/api/debug/session-cache-stats
# Monitor real-time auth logs
tail -f /var/log/application.log | grep -E "(SESSION|AUTH_REFRESH|MIDDLEWARE)"
# Check Keycloak connectivity
curl https://your-domain.com/api/debug/test-keycloak-connectivity
```
## Success Criteria
The deployment is considered successful when:
1. **No 2-Minute Logouts**: Users can navigate freely after 2 minutes
2. **Improved Error Handling**: Network issues don't cause immediate logouts
3. **Better Performance**: Session checks complete faster due to caching
4. **Enhanced Logging**: Detailed logs help with debugging future issues
5. **Graceful Degradation**: System handles transient failures elegantly
## Contact Information
For issues or questions regarding this deployment:
- **Technical Lead**: [Your Name]
- **Emergency Contact**: [Emergency Number]
- **Documentation**: This file and related docs in `/docs/` directory
## Appendix
### Environment Variables Required
```env
KEYCLOAK_CLIENT_SECRET=your-secret-key
COOKIE_DOMAIN=.portnimara.dev
```
### Log Examples
Successful session check:
```
[SESSION_MANAGER:abc123] Session check completed: {"authenticated":true,"reason":null,"fromCache":false}
```
Cache hit:
```
[SESSION_MANAGER:def456] Using cached session (age: 45 seconds)
```
Network error with grace period:
```
[SESSION_MANAGER:ghi789] Using cached result due to network error
```
### Performance Benchmarks
- **Session Check Duration**: < 500ms average
- **Cache Hit Rate**: > 70%
- **Authentication Success Rate**: > 99%
- **Network Error Recovery**: < 5 seconds