port-nimara-client-portal/docs/authentication-session-time...

8.5 KiB

Authentication Session Timeout Fix - Deployment Guide

Overview

This document provides step-by-step instructions for deploying the authentication session timeout fixes that resolve the 2-minute logout issue.

Problem Summary

Users were experiencing unexpected logouts after exactly 2 minutes when navigating between pages. This was caused by:

  1. Timing Race Condition: Authentication middleware cache expiry (2 minutes) and auth refresh plugin periodic validation (2 minutes) occurring simultaneously
  2. No Request Deduplication: Multiple concurrent session checks causing conflicts
  3. Insufficient Error Handling: Network errors triggering immediate logouts
  4. No Grace Periods: Transient issues causing permanent session loss

Solution Overview

Core Changes

  1. Session Manager Utility (server/utils/session-manager.ts)

    • Centralized session management with request deduplication
    • Promise caching for in-flight requests
    • Network error grace periods
    • Comprehensive logging and statistics
  2. Authentication Middleware (middleware/authentication.ts)

    • Changed cache expiry from 2 to 3 minutes with jitter
    • Integrated SessionManager for deduplication
    • Enhanced error handling and user feedback
  3. Auth Refresh Plugin (plugins/01.auth-refresh.client.ts)

    • Added random offset to prevent simultaneous validation
    • Improved concurrent validation prevention
    • Better error handling for network issues
  4. Session API (server/api/auth/session.ts)

    • Enhanced logging with request IDs
    • Detailed error categorization
    • Performance timing measurements
  5. Keycloak Client (server/utils/keycloak-client.ts)

    • Better error type distinction
    • Increased retry attempts for token refresh
    • Improved timeout handling
  6. Refresh API (server/api/auth/refresh.ts)

    • Enhanced error handling with request IDs
    • Grace period support for transient failures
    • Selective session clearing based on error type

Pre-deployment Checklist

  • Code Review: All changes reviewed and approved
  • Environment Variables: Verify all required environment variables are set
  • Dependencies: Confirm no new dependencies are required
  • Backup: Create backup of current production code
  • Monitoring: Ensure authentication logs are being captured
  • Testing: Verify fixes work in staging environment (if available)

Deployment Steps

Step 1: Deploy Session Manager Utility

  1. Deploy server/utils/session-manager.ts
  2. Verify no TypeScript compilation errors
  3. Check server logs for any startup issues

Step 2: Update Authentication Middleware

  1. Deploy updated middleware/authentication.ts
  2. Monitor for any middleware errors in logs
  3. Verify new timing configuration is active

Step 3: Update Auth Refresh Plugin

  1. Deploy updated plugins/01.auth-refresh.client.ts
  2. Check browser console for any client-side errors
  3. Verify random offset is working (check logs)

Step 4: Update Session API

  1. Deploy updated server/api/auth/session.ts
  2. Monitor API endpoint logs for request IDs
  3. Verify enhanced error messages are working

Step 5: Update Keycloak Client

  1. Deploy updated server/utils/keycloak-client.ts
  2. Check for any Keycloak communication errors
  3. Verify retry logic is functioning

Step 6: Update Refresh API

  1. Deploy updated server/api/auth/refresh.ts
  2. Monitor token refresh operations
  3. Verify graceful error handling

Post-deployment Verification

Immediate Verification (0-5 minutes)

  1. No Deployment Errors

    # Check server logs
    tail -f /var/log/application.log | grep -E "(ERROR|FATAL)"
    
    # Check for any 500 errors
    curl -I https://your-domain.com/api/health
    
  2. Login Flow Test

    • Navigate to login page
    • Complete authentication
    • Verify successful redirect to dashboard
  3. Session API Test

    # Test session endpoint
    curl -X GET https://your-domain.com/api/auth/session \
      -H "Cookie: nuxt-oidc-auth=<session-cookie>"
    

Short-term Verification (5-15 minutes)

  1. Navigation Test

    • Stay logged in for 5+ minutes
    • Navigate between different pages
    • Verify no unexpected logouts
  2. Log Analysis

    # Check for new session manager logs
    grep "SESSION_MANAGER" /var/log/application.log
    
    # Verify timing desynchronization
    grep "Using cached session" /var/log/application.log
    

Long-term Verification (15+ minutes)

  1. 2-Minute Boundary Test

    • Stay logged in for exactly 2 minutes
    • Navigate to a new page
    • Verify user remains authenticated
  2. 3-Minute Cache Test

    • Stay on same page for 3+ minutes
    • Navigate to new page
    • Verify session is refreshed, not lost
  3. Network Error Simulation

    • Temporarily block network access
    • Verify graceful degradation
    • Restore network and verify recovery

Monitoring and Alerts

Key Metrics to Monitor

  1. Authentication Errors

    # Monitor auth failure rate
    grep -c "AUTH_ERROR" /var/log/application.log
    
  2. Session Manager Performance

    # Check session check durations
    grep "Session check completed" /var/log/application.log
    
  3. Cache Hit Rate

    # Monitor cache effectiveness
    grep "Using cached session" /var/log/application.log | wc -l
    

Alert Thresholds

  • Auth Error Rate: > 5% of total auth checks
  • Session Check Duration: > 2 seconds average
  • Cache Miss Rate: > 80% (indicates caching issues)

Rollback Procedures

Immediate Rollback (if critical issues)

  1. Stop Application

    systemctl stop your-application
    
  2. Restore Previous Code

    git checkout previous-stable-tag
    npm install
    npm run build
    
  3. Restart Application

    systemctl start your-application
    
  4. Verify Rollback

    • Test login functionality
    • Check error logs
    • Verify user sessions work

Partial Rollback (if specific component issues)

  1. Identify Problem Component

    • Check which specific file is causing issues
    • Review recent error logs
  2. Rollback Specific Files

    git checkout HEAD~1 -- middleware/authentication.ts
    # or
    git checkout HEAD~1 -- server/utils/session-manager.ts
    
  3. Rebuild and Test

    npm run build
    systemctl restart your-application
    

Troubleshooting

Common Issues

  1. Users Still Getting Logged Out at 2 Minutes

    • Check if SessionManager is being used
    • Verify cache expiry changes are active
    • Look for timing synchronization issues
  2. Session Check Errors

    • Check network connectivity to Keycloak
    • Verify environment variables are set
    • Check Keycloak circuit breaker status
  3. Performance Issues

    • Monitor session check durations
    • Check cache hit rates
    • Verify request deduplication is working

Debug Commands

# Check session manager cache stats
curl https://your-domain.com/api/debug/session-cache-stats

# Monitor real-time auth logs
tail -f /var/log/application.log | grep -E "(SESSION|AUTH_REFRESH|MIDDLEWARE)"

# Check Keycloak connectivity
curl https://your-domain.com/api/debug/test-keycloak-connectivity

Success Criteria

The deployment is considered successful when:

  1. No 2-Minute Logouts: Users can navigate freely after 2 minutes
  2. Improved Error Handling: Network issues don't cause immediate logouts
  3. Better Performance: Session checks complete faster due to caching
  4. Enhanced Logging: Detailed logs help with debugging future issues
  5. Graceful Degradation: System handles transient failures elegantly

Contact Information

For issues or questions regarding this deployment:

  • Technical Lead: [Your Name]
  • Emergency Contact: [Emergency Number]
  • Documentation: This file and related docs in /docs/ directory

Appendix

Environment Variables Required

KEYCLOAK_CLIENT_SECRET=your-secret-key
COOKIE_DOMAIN=.portnimara.dev

Log Examples

Successful session check:

[SESSION_MANAGER:abc123] Session check completed: {"authenticated":true,"reason":null,"fromCache":false}

Cache hit:

[SESSION_MANAGER:def456] Using cached session (age: 45 seconds)

Network error with grace period:

[SESSION_MANAGER:ghi789] Using cached result due to network error

Performance Benchmarks

  • Session Check Duration: < 500ms average
  • Cache Hit Rate: > 70%
  • Authentication Success Rate: > 99%
  • Network Error Recovery: < 5 seconds