# Round Type: FILTERING — AI Screening & Eligibility ## Overview The **FILTERING** round type (Round 2 in typical flow) performs automated screening of applications to identify eligible projects, detect duplicates, and flag edge cases for admin review. It replaces the current `FILTER` stage with enhanced features: rule-based filtering, AI-powered screening, duplicate detection, and a manual override system. ### Purpose 1. **Automated Eligibility Checks** — Field-based rules (age, category, country, etc.) and document checks (required files) 2. **AI Screening** — GPT-powered rubric evaluation with confidence banding 3. **Duplicate Detection** — Cross-application similarity checking to catch multiple submissions from same applicant 4. **Manual Review Queue** — Flagged projects go to admin dashboard for final decision 5. **Admin Override** — Any automated decision can be manually reversed with audit trail ### Key Features | Feature | Description | |---------|-------------| | **Multi-Rule Engine** | Field-based, document-check, and AI rules run in sequence | | **Confidence Banding** | AI results split into auto-pass (high), manual-review (medium), auto-reject (low) | | **Duplicate Detection** | Built-in email-based duplicate check (always flags for review) | | **Manual Override** | Admin can approve/reject any flagged project with reason | | **Batch Processing** | AI screening runs in configurable batches for performance | | **Progress Tracking** | FilteringJob model tracks long-running jobs | | **Audit Trail** | All decisions logged in DecisionAuditLog | --- ## Current System ### Stage Model ```prisma model Stage { id String @id trackId String stageType StageType // FILTER name String slug String status StageStatus sortOrder Int configJson Json? // Generic blob — hard to know what's configurable windowOpenAt DateTime? windowCloseAt DateTime? filteringRules FilteringRule[] filteringResults FilteringResult[] filteringJobs FilteringJob[] } enum StageType { INTAKE FILTER // <-- Current filtering stage EVALUATION SELECTION LIVE_FINAL RESULTS } ``` ### FilteringRule Model ```prisma model FilteringRule { id String @id stageId String name String ruleType FilteringRuleType configJson Json @db.JsonB // Type-specific config priority Int @default(0) isActive Boolean @default(true) stage Stage @relation(fields: [stageId], references: [id], onDelete: Cascade) } enum FilteringRuleType { FIELD_BASED // Field checks (category, country, age, etc.) DOCUMENT_CHECK // File existence/type checks AI_SCREENING // GPT rubric evaluation } ``` **Rule configJson shapes:** ```typescript // FIELD_BASED { conditions: [ { field: "competitionCategory", operator: "equals", value: "STARTUP" }, { field: "foundedAt", operator: "older_than_years", value: 5 } ], logic: "AND" | "OR", action: "PASS" | "REJECT" | "FLAG" } // DOCUMENT_CHECK { requiredFileTypes: ["pdf", "docx"], minFileCount: 2, action: "FLAG" } // AI_SCREENING { criteriaText: "Project must demonstrate clear ocean conservation impact", action: "FLAG", batchSize: 20, parallelBatches: 1 } ``` ### FilteringResult Model ```prisma model FilteringResult { id String @id stageId String projectId String outcome FilteringOutcome // PASSED | FILTERED_OUT | FLAGGED ruleResultsJson Json? @db.JsonB // Per-rule results aiScreeningJson Json? @db.JsonB // AI screening details // Admin override overriddenBy String? overriddenAt DateTime? overrideReason String? @db.Text finalOutcome FilteringOutcome? stage Stage @relation(fields: [stageId], references: [id]) project Project @relation(fields: [projectId], references: [id]) overriddenByUser User? @relation("FilteringOverriddenBy", fields: [overriddenBy], references: [id]) @@unique([stageId, projectId]) } enum FilteringOutcome { PASSED // Auto-advance to next round FILTERED_OUT // Auto-reject FLAGGED // Manual review required } ``` ### FilteringJob Model ```prisma model FilteringJob { id String @id stageId String status FilteringJobStatus @default(PENDING) totalProjects Int @default(0) totalBatches Int @default(0) currentBatch Int @default(0) processedCount Int @default(0) passedCount Int @default(0) filteredCount Int @default(0) flaggedCount Int @default(0) errorMessage String? @db.Text startedAt DateTime? completedAt DateTime? stage Stage @relation(fields: [stageId], references: [id]) } enum FilteringJobStatus { PENDING RUNNING COMPLETED FAILED } ``` ### AI Screening Flow ```typescript // src/server/services/ai-filtering.ts export async function executeAIScreening( config: AIScreeningConfig, projects: ProjectForFiltering[], userId?: string, entityId?: string, onProgress?: ProgressCallback ): Promise> ``` **AI Screening Steps:** 1. **Anonymization** — Strip PII before sending to OpenAI (see `anonymization.ts`) 2. **Batch Processing** — Group projects into configurable batch sizes (default 20) 3. **GPT Evaluation** — Send to OpenAI with rubric criteria 4. **Result Parsing** — Parse JSON response with confidence scores 5. **Confidence Banding** — Split into auto-pass/manual-review/auto-reject buckets 6. **Logging** — Track token usage in AIUsageLog **Confidence Thresholds:** ```typescript const AI_CONFIDENCE_THRESHOLD_PASS = 0.75 // Auto-pass if ≥ 0.75 and meetsAllCriteria const AI_CONFIDENCE_THRESHOLD_REJECT = 0.25 // Auto-reject if ≤ 0.25 and !meetsAllCriteria // Between 0.25-0.75 → FLAGGED for manual review ``` ### Duplicate Detection ```typescript // Current implementation in stage-filtering.ts (lines 264-289) // Groups projects by submittedByEmail to detect duplicates // Duplicates are ALWAYS flagged (never auto-rejected) const duplicateProjectIds = new Set() const emailToProjects = new Map>() for (const project of projects) { const email = (project.submittedByEmail ?? '').toLowerCase().trim() if (!email) continue if (!emailToProjects.has(email)) emailToProjects.set(email, []) emailToProjects.get(email)!.push({ id: project.id, title: project.title }) } // If any email has > 1 project, all siblings are flagged emailToProjects.forEach((group, _email) => { if (group.length <= 1) return for (const p of group) { duplicateProjectIds.add(p.id) } }) ``` **Duplicate Metadata Stored:** ```json { "isDuplicate": true, "siblingProjectIds": ["proj-2", "proj-3"], "duplicateNote": "This project shares a submitter email with 2 other project(s)." } ``` ### Filtering Execution Flow ```typescript // src/server/services/stage-filtering.ts export async function runStageFiltering( stageId: string, actorId: string, prisma: PrismaClient ): Promise ``` **Execution Steps:** 1. Load all projects in PENDING/IN_PROGRESS state for this stage 2. Create FilteringJob for progress tracking 3. Load active FilteringRule records (ordered by priority) 4. **Run duplicate detection** (built-in, always runs first) 5. **Run deterministic rules** (FIELD_BASED, DOCUMENT_CHECK) - If any REJECT rule fails → outcome = FILTERED_OUT - If any FLAG rule fails → outcome = FLAGGED 6. **Run AI screening** (if enabled and deterministic passed OR if duplicate) - Batch process with configurable size - Band by confidence - Attach duplicate metadata 7. **Save FilteringResult** for each project 8. Update FilteringJob counts (passed/rejected/flagged) 9. Log decision audit --- ## Redesigned Filtering Round ### Round Model Changes ```prisma model Round { id String @id @default(cuid()) competitionId String name String // "AI Screening & Eligibility Check" slug String // "filtering" roundType RoundType // FILTERING (renamed from FILTER) status RoundStatus @default(ROUND_DRAFT) sortOrder Int @default(0) // Time windows windowOpenAt DateTime? windowCloseAt DateTime? // Round-type-specific configuration (validated by Zod) configJson Json? @db.JsonB // Relations competition Competition @relation(fields: [competitionId], references: [id]) projectRoundStates ProjectRoundState[] filteringRules FilteringRule[] filteringResults FilteringResult[] filteringJobs FilteringJob[] advancementRules AdvancementRule[] } enum RoundType { INTAKE FILTERING // Renamed from FILTER for clarity EVALUATION SUBMISSION // New: multi-round submissions MENTORING // New: mentor workspace LIVE_FINAL CONFIRMATION // New: winner agreement } ``` ### FilteringConfig Type (Zod-Validated) ```typescript // src/types/round-configs.ts export type FilteringConfig = { // Rule engine rules: FilterRuleDef[] // Configured rules (can be empty to skip deterministic filtering) // AI screening aiScreeningEnabled: boolean aiRubricPrompt: string // Custom rubric for AI (plain-language criteria) aiConfidenceThresholds: { high: number // Above this = auto-pass (default: 0.85) medium: number // Above this = flag for review (default: 0.6) low: number // Below this = auto-reject (default: 0.4) } aiBatchSize: number // Projects per AI batch (default: 20, max: 50) aiParallelBatches: number // Concurrent batches (default: 1, max: 10) // Duplicate detection duplicateDetectionEnabled: boolean duplicateThreshold: number // Email similarity threshold (0-1, default: 1.0 = exact match) duplicateAction: 'FLAG' | 'AUTO_REJECT' // Default: FLAG (always recommend FLAG) // Advancement behavior autoAdvancePassingProjects: boolean // Auto-advance PASSED projects to next round manualReviewRequired: boolean // All results require admin approval before advance // Eligibility criteria (structured) eligibilityCriteria: EligibilityCriteria[] // Category-specific rules categorySpecificRules: { STARTUP?: CategoryRuleSet BUSINESS_CONCEPT?: CategoryRuleSet } } export type FilterRuleDef = { id?: string // Optional — for editing existing rules name: string ruleType: 'FIELD_CHECK' | 'DOCUMENT_CHECK' | 'AI_SCORE' | 'DUPLICATE' | 'CUSTOM' config: FilterRuleConfig priority: number // Lower = run first isActive: boolean action: 'PASS' | 'REJECT' | 'FLAG' } export type FilterRuleConfig = | FieldCheckConfig | DocumentCheckConfig | AIScoreConfig | CustomConfig export type FieldCheckConfig = { conditions: FieldCondition[] logic: 'AND' | 'OR' } export type FieldCondition = { field: 'competitionCategory' | 'foundedAt' | 'country' | 'geographicZone' | 'tags' | 'oceanIssue' | 'wantsMentorship' | 'institution' operator: 'equals' | 'not_equals' | 'contains' | 'in' | 'not_in' | 'is_empty' | 'greater_than' | 'less_than' | 'older_than_years' | 'newer_than_years' value: string | number | string[] | boolean } export type DocumentCheckConfig = { requiredFileTypes?: string[] // e.g., ['pdf', 'docx'] minFileCount?: number maxFileCount?: number minTotalSizeMB?: number maxTotalSizeMB?: number } export type AIScoreConfig = { criteriaText: string // Plain-language rubric minScore: number // Minimum AI score to pass (0-10) weightInOverall: number // Weight if combining multiple AI rules (0-1) } export type CustomConfig = { // For future extension — custom JS/Python evaluation scriptUrl?: string functionName?: string parameters?: Record } export type EligibilityCriteria = { name: string description: string required: boolean checkType: 'field' | 'document' | 'ai' | 'custom' checkConfig: FilterRuleConfig } export type CategoryRuleSet = { minAge?: number // Years since founded maxAge?: number requiredTags?: string[] excludedCountries?: string[] requiredDocuments?: string[] } ``` ### Zod Schema for FilteringConfig ```typescript // src/lib/round-config-schemas.ts import { z } from 'zod' export const FieldConditionSchema = z.object({ field: z.enum([ 'competitionCategory', 'foundedAt', 'country', 'geographicZone', 'tags', 'oceanIssue', 'wantsMentorship', 'institution' ]), operator: z.enum([ 'equals', 'not_equals', 'contains', 'in', 'not_in', 'is_empty', 'greater_than', 'less_than', 'older_than_years', 'newer_than_years' ]), value: z.union([ z.string(), z.number(), z.array(z.string()), z.boolean() ]) }) export const FieldCheckConfigSchema = z.object({ conditions: z.array(FieldConditionSchema), logic: z.enum(['AND', 'OR']) }) export const DocumentCheckConfigSchema = z.object({ requiredFileTypes: z.array(z.string()).optional(), minFileCount: z.number().int().min(0).optional(), maxFileCount: z.number().int().min(0).optional(), minTotalSizeMB: z.number().min(0).optional(), maxTotalSizeMB: z.number().min(0).optional() }) export const AIScoreConfigSchema = z.object({ criteriaText: z.string().min(10).max(5000), minScore: z.number().min(0).max(10), weightInOverall: z.number().min(0).max(1).default(1.0) }) export const CustomConfigSchema = z.object({ scriptUrl: z.string().url().optional(), functionName: z.string().optional(), parameters: z.record(z.unknown()).optional() }) export const FilterRuleDefSchema = z.object({ id: z.string().optional(), name: z.string().min(1).max(255), ruleType: z.enum(['FIELD_CHECK', 'DOCUMENT_CHECK', 'AI_SCORE', 'DUPLICATE', 'CUSTOM']), config: z.union([ FieldCheckConfigSchema, DocumentCheckConfigSchema, AIScoreConfigSchema, CustomConfigSchema ]), priority: z.number().int().min(0).default(0), isActive: z.boolean().default(true), action: z.enum(['PASS', 'REJECT', 'FLAG']) }) export const CategoryRuleSetSchema = z.object({ minAge: z.number().int().min(0).optional(), maxAge: z.number().int().min(0).optional(), requiredTags: z.array(z.string()).optional(), excludedCountries: z.array(z.string()).optional(), requiredDocuments: z.array(z.string()).optional() }) export const FilteringConfigSchema = z.object({ rules: z.array(FilterRuleDefSchema).default([]), aiScreeningEnabled: z.boolean().default(false), aiRubricPrompt: z.string().min(0).max(10000).default(''), aiConfidenceThresholds: z.object({ high: z.number().min(0).max(1).default(0.85), medium: z.number().min(0).max(1).default(0.6), low: z.number().min(0).max(1).default(0.4) }).default({ high: 0.85, medium: 0.6, low: 0.4 }), aiBatchSize: z.number().int().min(1).max(50).default(20), aiParallelBatches: z.number().int().min(1).max(10).default(1), duplicateDetectionEnabled: z.boolean().default(true), duplicateThreshold: z.number().min(0).max(1).default(1.0), duplicateAction: z.enum(['FLAG', 'AUTO_REJECT']).default('FLAG'), autoAdvancePassingProjects: z.boolean().default(false), manualReviewRequired: z.boolean().default(true), eligibilityCriteria: z.array(z.object({ name: z.string(), description: z.string(), required: z.boolean(), checkType: z.enum(['field', 'document', 'ai', 'custom']), checkConfig: z.union([ FieldCheckConfigSchema, DocumentCheckConfigSchema, AIScoreConfigSchema, CustomConfigSchema ]) })).default([]), categorySpecificRules: z.object({ STARTUP: CategoryRuleSetSchema.optional(), BUSINESS_CONCEPT: CategoryRuleSetSchema.optional() }).default({}) }) export type FilteringConfig = z.infer ``` --- ## Filtering Rule Engine ### Rule Evaluation Order ``` 1. Built-in Duplicate Detection (if enabled) ↓ 2. FIELD_CHECK rules (sorted by priority ascending) ↓ 3. DOCUMENT_CHECK rules (sorted by priority ascending) ↓ 4. AI_SCORE rules (if aiScreeningEnabled) — batch processed ↓ 5. CUSTOM rules (future extension) ↓ 6. Determine final outcome: PASSED | FILTERED_OUT | FLAGGED ``` ### Rule Types in Detail #### 1. FIELD_CHECK **Purpose:** Validate project metadata fields against conditions. **Operators:** | Operator | Description | Example | |----------|-------------|---------| | `equals` | Field equals value | `competitionCategory equals "STARTUP"` | | `not_equals` | Field does not equal value | `country not_equals "France"` | | `contains` | Field contains substring (case-insensitive) | `tags contains "conservation"` | | `in` | Field value is in array | `country in ["Monaco", "France", "Italy"]` | | `not_in` | Field value not in array | `oceanIssue not_in ["OTHER"]` | | `is_empty` | Field is null, empty string, or empty array | `institution is_empty` | | `greater_than` | Numeric comparison | `teamMemberCount greater_than 2` | | `less_than` | Numeric comparison | `fundingGoal less_than 100000` | | `older_than_years` | Date comparison (foundedAt) | `foundedAt older_than_years 5` | | `newer_than_years` | Date comparison (foundedAt) | `foundedAt newer_than_years 2` | **Example Rule:** ```json { "name": "Startups Must Be < 5 Years Old", "ruleType": "FIELD_CHECK", "config": { "conditions": [ { "field": "competitionCategory", "operator": "equals", "value": "STARTUP" }, { "field": "foundedAt", "operator": "newer_than_years", "value": 5 } ], "logic": "AND" }, "priority": 10, "isActive": true, "action": "REJECT" } ``` **Logic:** - `AND`: All conditions must be true - `OR`: At least one condition must be true **Action:** - `PASS`: If conditions met, mark as passed (continue to next rule) - `REJECT`: If conditions met, auto-reject (short-circuit) - `FLAG`: If conditions met, flag for manual review #### 2. DOCUMENT_CHECK **Purpose:** Verify file uploads meet requirements. **Checks:** ```typescript type DocumentCheckConfig = { requiredFileTypes?: string[] // e.g., ['pdf', 'docx'] — must have at least one of each minFileCount?: number // Minimum number of files maxFileCount?: number // Maximum number of files minTotalSizeMB?: number // Minimum total upload size maxTotalSizeMB?: number // Maximum total upload size } ``` **Example Rule:** ```json { "name": "Must Upload Executive Summary + Business Plan", "ruleType": "DOCUMENT_CHECK", "config": { "requiredFileTypes": ["pdf"], "minFileCount": 2 }, "priority": 20, "isActive": true, "action": "FLAG" } ``` #### 3. AI_SCORE **Purpose:** GPT-powered rubric evaluation. **Config:** ```typescript type AIScoreConfig = { criteriaText: string // Plain-language rubric minScore: number // Minimum score to pass (0-10) weightInOverall: number // Weight if combining multiple AI rules } ``` **Example Rule:** ```json { "name": "AI: Ocean Impact Assessment", "ruleType": "AI_SCORE", "config": { "criteriaText": "Project must demonstrate measurable ocean conservation impact with clear metrics and realistic timeline. Reject spam or unrelated projects.", "minScore": 6.0, "weightInOverall": 1.0 }, "priority": 30, "isActive": true, "action": "FLAG" } ``` **AI Evaluation Flow:** 1. Anonymize project data (strip PII) 2. Batch projects (configurable batch size) 3. Send to OpenAI with rubric 4. Parse response: ```json { "projects": [ { "project_id": "anon-123", "meets_criteria": true, "confidence": 0.82, "reasoning": "Clear ocean conservation focus, realistic metrics", "quality_score": 7, "spam_risk": false } ] } ``` 5. Band by confidence thresholds 6. Store in `aiScreeningJson` on FilteringResult #### 4. DUPLICATE **Purpose:** Detect multiple submissions from same applicant. **Built-in Rule:** - Always runs first if `duplicateDetectionEnabled: true` - Groups projects by `submittedByEmail` - Flags all projects in duplicate groups - Never auto-rejects duplicates (admin must decide which to keep) **Duplicate Metadata:** ```json { "isDuplicate": true, "siblingProjectIds": ["proj-2", "proj-3"], "duplicateNote": "This project shares a submitter email with 2 other project(s). Admin must review and decide which to keep.", "similarityScore": 1.0 } ``` **Future Enhancement: Semantic Similarity** ```typescript duplicateThreshold: number // 0-1 (e.g., 0.8 = 80% similar text triggers duplicate flag) ``` Use text embeddings to detect duplicates beyond exact email match (compare titles, descriptions). #### 5. CUSTOM (Future Extension) **Purpose:** Run custom evaluation scripts (JS/Python). **Config:** ```typescript type CustomConfig = { scriptUrl?: string // URL to hosted script functionName?: string // Function to call parameters?: Record } ``` **Example Use Case:** - External API call to verify company registration - Custom formula combining multiple fields - Integration with third-party data sources --- ## Rule Combination Logic ### How Rules Are Combined ```typescript // Pseudocode for rule evaluation let finalOutcome: 'PASSED' | 'FILTERED_OUT' | 'FLAGGED' = 'PASSED' let hasFailed = false let hasFlagged = false // Run rules in priority order for (const rule of rules.sort((a, b) => a.priority - b.priority)) { const result = evaluateRule(rule, project) if (!result.passed) { if (rule.action === 'REJECT') { hasFailed = true break // Short-circuit — no need to run remaining rules } else if (rule.action === 'FLAG') { hasFlagged = true // Continue to next rule } } } // Determine final outcome if (hasFailed) { finalOutcome = 'FILTERED_OUT' } else if (hasFlagged) { finalOutcome = 'FLAGGED' } else { finalOutcome = 'PASSED' } // Override: Duplicates always flagged (never auto-rejected) if (isDuplicate && finalOutcome === 'FILTERED_OUT') { finalOutcome = 'FLAGGED' } ``` ### Weighted Scoring (Advanced) For multiple AI rules or field checks, admins can configure weighted scoring: ```typescript type WeightedScoringConfig = { enabled: boolean rules: Array<{ ruleId: string weight: number // 0-1 }> passingThreshold: number // Combined weighted score needed to pass (0-10) } ``` **Example:** ```json { "enabled": true, "rules": [ { "ruleId": "ai-ocean-impact", "weight": 0.6 }, { "ruleId": "ai-innovation-score", "weight": 0.4 } ], "passingThreshold": 7.0 } ``` Combined score = (7.5 × 0.6) + (8.0 × 0.4) = 4.5 + 3.2 = 7.7 → PASSED --- ## AI Screening Pipeline ### Step-by-Step Flow ``` 1. Load Projects ↓ 2. Anonymize Data (strip PII) ↓ 3. Batch Projects (configurable size: 1-50, default 20) ↓ 4. Parallel Processing (configurable: 1-10 concurrent batches) ↓ 5. OpenAI API Call (GPT-4o or configured model) ↓ 6. Parse JSON Response ↓ 7. Map Anonymous IDs → Real Project IDs ↓ 8. Band by Confidence Threshold ↓ 9. Store Results in FilteringResult ↓ 10. Log Token Usage (AIUsageLog) ``` ### Anonymization ```typescript // src/server/services/anonymization.ts export function anonymizeProjectsForAI( projects: ProjectWithRelations[], purpose: 'FILTERING' | 'ASSIGNMENT' | 'SUMMARY' ): { anonymized: AnonymizedProjectForAI[]; mappings: ProjectAIMapping[] } ``` **What's Stripped:** - Team member names - Submitter email - Submitter name - Personal identifiers in metadata - File paths (only file types retained) **What's Kept:** - Project title (if generic) - Description - Category (STARTUP/BUSINESS_CONCEPT) - Country - Tags - Ocean issue - Founded date (year only) **Validation:** ```typescript export function validateAnonymizedProjects( anonymized: AnonymizedProjectForAI[] ): boolean ``` Checks for PII patterns: - Email addresses (`/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b/i`) - Phone numbers - Full names (heuristic) - URLs with query params **GDPR Compliance:** - All AI calls must pass `validateAnonymizedProjects()` check - Fails if PII detected → throws error, logs, flags all projects for manual review ### OpenAI Prompt Structure **System Prompt:** ``` Project screening assistant. Evaluate against criteria, return JSON. Format: {"projects": [{project_id, meets_criteria: bool, confidence: 0-1, reasoning: str, quality_score: 1-10, spam_risk: bool}]} Be objective. Base evaluation only on provided data. No personal identifiers in reasoning. ``` **User Prompt:** ``` CRITERIA: {aiRubricPrompt} PROJECTS: [{project_id, title, description, category, tags, ...}] Evaluate and return JSON. ``` **Response Format:** ```json { "projects": [ { "project_id": "anon-001", "meets_criteria": true, "confidence": 0.82, "reasoning": "Clear ocean conservation focus with measurable impact metrics. Realistic timeline. Strong innovation.", "quality_score": 8, "spam_risk": false }, { "project_id": "anon-002", "meets_criteria": false, "confidence": 0.91, "reasoning": "Generic description, no specific ocean impact. Appears to be spam or off-topic.", "quality_score": 2, "spam_risk": true } ] } ``` ### Confidence Banding ```typescript function bandByConfidence( aiScreeningData: { confidence: number; meetsAllCriteria: boolean } ): { outcome: 'PASSED' | 'FILTERED_OUT' | 'FLAGGED'; confidence: number } ``` **Default Thresholds:** | Confidence | Meets Criteria | Outcome | Action | |------------|----------------|---------|--------| | ≥ 0.85 | true | PASSED | Auto-advance | | 0.60-0.84 | true | FLAGGED | Manual review | | 0.40-0.59 | any | FLAGGED | Manual review | | ≤ 0.39 | false | FILTERED_OUT | Auto-reject | **Admin Override:** Admins can customize thresholds in `FilteringConfig.aiConfidenceThresholds`. --- ## Duplicate Detection ### Current Implementation ```typescript // Built-in email-based duplicate detection const emailToProjects = new Map>() for (const project of projects) { const email = (project.submittedByEmail ?? '').toLowerCase().trim() if (!email) continue if (!emailToProjects.has(email)) emailToProjects.set(email, []) emailToProjects.get(email)!.push({ id: project.id, title: project.title }) } // Flag all projects in groups of size > 1 emailToProjects.forEach((group) => { if (group.length <= 1) return for (const p of group) { duplicateProjectIds.add(p.id) } }) ``` ### Enhanced Detection (Future) **Text Similarity:** ```typescript import { cosineSimilarity } from '@/lib/text-similarity' function detectDuplicatesByText( projects: Project[], threshold: number = 0.8 ): Set ``` **Algorithm:** 1. Generate text embeddings for title + description 2. Compute pairwise cosine similarity 3. Flag projects with similarity ≥ threshold 4. Group into duplicate clusters **Example:** Project A: "Ocean cleanup robot using AI" Project B: "AI-powered ocean cleaning robot" Similarity: 0.92 → Flagged as duplicates ### Duplicate Metadata ```json { "isDuplicate": true, "siblingProjectIds": ["proj-2", "proj-3"], "duplicateNote": "This project shares a submitter email with 2 other project(s). Admin must review and decide which to keep.", "similarityScore": 1.0, "detectionMethod": "email" | "text_similarity" } ``` ### Admin Duplicate Review UI ``` ┌─────────────────────────────────────────────────────────────┐ │ Duplicate Group: applicant@example.com │ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ Project 1: "Ocean Cleanup Robot" │ │ │ │ Submitted: 2026-02-01 10:30 AM │ │ │ │ Category: STARTUP │ │ │ │ AI Score: 7.5/10 │ │ │ │ │ │ │ │ [✓ Keep This] [✗ Reject] [View Details] │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ Project 2: "AI-Powered Ocean Cleaner" │ │ │ │ Submitted: 2026-02-05 2:15 PM │ │ │ │ Category: STARTUP │ │ │ │ AI Score: 6.8/10 │ │ │ │ │ │ │ │ [✓ Keep This] [✗ Reject] [View Details] │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ │ Recommendation: Keep Project 1 (higher AI score, earlier │ │ submission) │ │ │ │ [Approve Recommendation] [Manual Decision] │ └─────────────────────────────────────────────────────────────┘ ``` --- ## Admin Experience ### Filtering Dashboard ``` ┌───────────────────────────────────────────────────────────────────┐ │ Round 2: AI Screening & Eligibility Check │ │ │ │ Status: Completed ● Last Run: 2026-02-10 3:45 PM │ │ │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ Results Summary │ │ │ │ │ │ │ │ ✓ Passed: 142 projects (auto-advance enabled) │ │ │ │ ✗ Filtered Out: 28 projects │ │ │ │ ⚠ Flagged: 15 projects (manual review required) │ │ │ │ ──────────────────────────────────────────────────────── │ │ │ │ Total: 185 projects processed │ │ │ │ │ │ │ │ AI Usage: 12,450 tokens ($0.15) │ │ │ │ Processing Time: 2m 34s │ │ │ └─────────────────────────────────────────────────────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ Manual Review Queue (15) [Sort ▼] │ │ │ │ │ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ │ │ ⚠ Ocean Cleanup Initiative │ │ │ │ │ │ Category: STARTUP │ │ │ │ │ │ Reason: Duplicate submission (2 projects) │ │ │ │ │ │ AI Score: 7.2/10 (confidence: 0.65) │ │ │ │ │ │ │ │ │ │ │ │ Failed Rules: │ │ │ │ │ │ • Duplicate Detection: EMAIL_MATCH │ │ │ │ │ │ │ │ │ │ │ │ [View Details] [✓ Approve] [✗ Reject] │ │ │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ │ │ ⚠ Blue Carbon Project │ │ │ │ │ │ Category: BUSINESS_CONCEPT │ │ │ │ │ │ Reason: AI confidence medium (0.58) │ │ │ │ │ │ AI Score: 5.5/10 │ │ │ │ │ │ │ │ │ │ │ │ AI Reasoning: "Project description is vague and │ │ │ │ │ │ lacks specific impact metrics. Needs clarification." │ │ │ │ │ │ │ │ │ │ │ │ [View Details] [✓ Approve] [✗ Reject] │ │ │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ │ ... 13 more flagged projects │ │ │ │ │ │ │ │ [Batch Approve All] [Export Queue] │ │ │ └─────────────────────────────────────────────────────────────┘ │ │ │ │ [Re-run Filtering] [Configure Rules] [View Logs] │ └───────────────────────────────────────────────────────────────────┘ ``` ### Rule Configuration UI ``` ┌───────────────────────────────────────────────────────────────────┐ │ Filtering Rules Configuration │ │ │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ Active Rules (5) [+ Add Rule] │ │ │ │ │ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ │ │ ≡ Rule 1: Startups Must Be < 5 Years Old │ │ │ │ │ │ Type: FIELD_CHECK │ │ │ │ │ │ Action: REJECT │ │ │ │ │ │ Priority: 10 [Edit] [✗] │ │ │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ │ │ ≡ Rule 2: Must Upload Executive Summary │ │ │ │ │ │ Type: DOCUMENT_CHECK │ │ │ │ │ │ Action: FLAG │ │ │ │ │ │ Priority: 20 [Edit] [✗] │ │ │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ │ │ ≡ Rule 3: AI Ocean Impact Assessment │ │ │ │ │ │ Type: AI_SCORE │ │ │ │ │ │ Action: FLAG │ │ │ │ │ │ Priority: 30 [Edit] [✗] │ │ │ │ │ │ Rubric: "Project must demonstrate measurable..." │ │ │ │ │ │ Min Score: 6.0/10 │ │ │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ │ ... 2 more rules │ │ │ └─────────────────────────────────────────────────────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ AI Settings │ │ │ │ │ │ │ │ AI Screening: [✓ Enabled] │ │ │ │ Batch Size: [20] projects (1-50) │ │ │ │ Parallel Batches: [2] (1-10) │ │ │ │ │ │ │ │ Confidence Thresholds: │ │ │ │ High (auto-pass): [0.85] │ │ │ │ Medium (review): [0.60] │ │ │ │ Low (auto-reject): [0.40] │ │ │ └─────────────────────────────────────────────────────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ Duplicate Detection │ │ │ │ │ │ │ │ Email-based: [✓ Enabled] │ │ │ │ Text similarity: [ ] Disabled (future) │ │ │ │ Similarity threshold: [0.80] (0-1) │ │ │ │ Action on duplicates: [FLAG] (recommended) │ │ │ └─────────────────────────────────────────────────────────────┘ │ │ │ │ [Save Configuration] [Test Rules] [Cancel] │ └───────────────────────────────────────────────────────────────────┘ ``` ### Manual Override Controls ``` ┌───────────────────────────────────────────────────────────────────┐ │ Manual Override: Ocean Cleanup Initiative │ │ │ │ Current Outcome: ⚠ FLAGGED │ │ │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ Project Details │ │ │ │ │ │ │ │ Title: Ocean Cleanup Initiative │ │ │ │ Category: STARTUP │ │ │ │ Submitted: 2026-02-01 10:30 AM │ │ │ │ Applicant: applicant@example.com │ │ │ │ │ │ │ │ Description: [View Full Description] │ │ │ │ Files: executive-summary.pdf, business-plan.docx │ │ │ └─────────────────────────────────────────────────────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ Filtering Results │ │ │ │ │ │ │ │ ✓ Rule 1: Startups < 5 Years Old PASSED │ │ │ │ ✓ Rule 2: Upload Executive Summary PASSED │ │ │ │ ✗ Rule 3: Duplicate Detection FLAGGED │ │ │ │ → Reason: 2 projects from applicant@example.com │ │ │ │ → Sibling: "AI-Powered Ocean Cleaner" (proj-2) │ │ │ │ ⚠ Rule 4: AI Ocean Impact FLAGGED │ │ │ │ → AI Score: 7.2/10 │ │ │ │ → Confidence: 0.65 (medium) │ │ │ │ → Reasoning: "Clear ocean focus but needs more specific │ │ │ │ impact metrics. Potential duplicate." │ │ │ └─────────────────────────────────────────────────────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ Override Decision │ │ │ │ │ │ │ │ New Outcome: ○ Approve (PASSED) ○ Reject (FILTERED_OUT) │ │ │ │ │ │ │ │ Reason (required): │ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ │ │ Reviewed duplicate group — this is the stronger │ │ │ │ │ │ submission. AI score above threshold. Approved to │ │ │ │ │ │ advance. │ │ │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ │ [Submit Override] [Cancel] │ │ │ └─────────────────────────────────────────────────────────────┘ │ └───────────────────────────────────────────────────────────────────┘ ``` --- ## API Changes ### New tRPC Procedures ```typescript // src/server/routers/filtering.ts export const filteringRouter = router({ // Run filtering for a round runFiltering: adminProcedure .input(z.object({ roundId: z.string() })) .mutation(async ({ ctx, input }) => { return runStageFiltering(input.roundId, ctx.user.id, ctx.prisma) }), // Get filtering job status getJob: adminProcedure .input(z.object({ jobId: z.string() })) .query(async ({ ctx, input }) => { return ctx.prisma.filteringJob.findUnique({ where: { id: input.jobId }, include: { round: { select: { name: true } } } }) }), // Get manual review queue getManualQueue: adminProcedure .input(z.object({ roundId: z.string() })) .query(async ({ ctx, input }) => { return getManualQueue(input.roundId, ctx.prisma) }), // Resolve manual decision resolveDecision: adminProcedure .input(z.object({ filteringResultId: z.string(), outcome: z.enum(['PASSED', 'FILTERED_OUT']), reason: z.string().min(10).max(1000) })) .mutation(async ({ ctx, input }) => { return resolveManualDecision( input.filteringResultId, input.outcome, input.reason, ctx.user.id, ctx.prisma ) }), // Batch override batchResolve: adminProcedure .input(z.object({ filteringResultIds: z.array(z.string()), outcome: z.enum(['PASSED', 'FILTERED_OUT']), reason: z.string().min(10).max(1000) })) .mutation(async ({ ctx, input }) => { for (const id of input.filteringResultIds) { await resolveManualDecision(id, input.outcome, input.reason, ctx.user.id, ctx.prisma) } }), // Export results exportResults: adminProcedure .input(z.object({ roundId: z.string() })) .query(async ({ ctx, input }) => { // Return CSV-ready data }), // Configure filtering rules configureRules: adminProcedure .input(z.object({ roundId: z.string(), rules: z.array(FilterRuleDefSchema) })) .mutation(async ({ ctx, input }) => { // Delete existing rules, create new ones }), // Update round config updateConfig: adminProcedure .input(z.object({ roundId: z.string(), config: FilteringConfigSchema })) .mutation(async ({ ctx, input }) => { await ctx.prisma.round.update({ where: { id: input.roundId }, data: { configJson: input.config as any } }) }) }) ``` --- ## Service Functions ### Core Service Signatures ```typescript // src/server/services/round-filtering.ts export async function runRoundFiltering( roundId: string, actorId: string, prisma: PrismaClient ): Promise export async function getManualQueue( roundId: string, prisma: PrismaClient ): Promise export async function resolveManualDecision( filteringResultId: string, outcome: 'PASSED' | 'FILTERED_OUT', reason: string, actorId: string, prisma: PrismaClient ): Promise export async function advanceFromFilteringRound( roundId: string, actorId: string, prisma: PrismaClient ): Promise type FilteringJobResult = { jobId: string total: number passed: number rejected: number flagged: number tokensUsed: number processingTime: number } type ManualQueueItem = { filteringResultId: string projectId: string projectTitle: string outcome: string ruleResults: RuleResult[] aiScreeningJson: Record | null createdAt: Date } type AdvancementResult = { advancedCount: number targetRoundId: string targetRoundName: string notificationsSent: number } ``` --- ## Edge Cases | Edge Case | Handling | |-----------|----------| | **No projects to filter** | FilteringJob completes immediately with 0 processed | | **AI API failure** | Flag all projects for manual review, log error, continue | | **Duplicate with different outcomes** | Always flag duplicates (never auto-reject) | | **Admin overrides auto-rejected project** | Allowed — finalOutcome overrides outcome | | **Project withdrawn during filtering** | Skip in filtering, mark WITHDRAWN in ProjectRoundState | | **Rule misconfiguration** | Validate config on save, throw error if invalid | | **All projects flagged** | Valid scenario — requires manual review for all | | **All projects auto-rejected** | Valid scenario — no advancement | | **Advancement before manual review** | Blocked if `manualReviewRequired: true` | | **Re-run filtering** | Deletes previous FilteringResult records, runs fresh | | **AI response parse error** | Flag affected projects, log error, continue | | **Duplicate groups > 10 projects** | Flag all, recommend batch review in UI | | **Missing submittedByEmail** | Skip duplicate detection for this project | | **Empty rule set** | All projects auto-pass (useful for testing) | --- ## Integration Points ### Connects To: INTAKE Round (Input) - **Input:** Projects in PENDING/IN_PROGRESS state from INTAKE round - **Data:** Project metadata, submitted files, team member data - **Trigger:** Admin manually runs filtering after INTAKE window closes ### Connects To: EVALUATION Round (Output) - **Output:** Passing projects advance to EVALUATION round - **Data:** FilteringResult metadata attached to projects (AI scores, flags) - **Trigger:** Auto-advance if `autoAdvancePassingProjects: true`, else manual ### Connects To: AI Services - **Service:** `src/server/services/ai-filtering.ts` - **Purpose:** GPT-powered rubric evaluation - **Data Flow:** Anonymized project data → OpenAI → parsed results → confidence banding ### Connects To: Audit System - **Tables:** `DecisionAuditLog`, `OverrideAction`, `AuditLog`, `AIUsageLog` - **Events:** `filtering.completed`, `filtering.manual_decision`, `filtering.auto_advanced` --- ## Summary The redesigned FILTERING round provides: 1. **Flexible Rule Engine** — Field checks, document checks, AI scoring, duplicates, custom scripts 2. **AI-Powered Screening** — GPT rubric evaluation with confidence banding 3. **Built-in Duplicate Detection** — Email-based (future: text similarity) 4. **Manual Review Queue** — Admin override system with full audit trail 5. **Batch Processing** — Configurable batch sizes for performance 6. **Progress Tracking** — FilteringJob model for long-running operations 7. **Auto-Advancement** — Passing projects can auto-advance to next round 8. **Full Auditability** — All decisions logged in DecisionAuditLog + OverrideAction This replaces the current `FILTER` stage with a fully-featured, production-ready filtering system that balances automation with human oversight.