MOPC-App/docs/claude-architecture-redesign/05-round-filtering.md

# Round Type: FILTERING — AI Screening & Eligibility

## Overview

The **FILTERING** round type (Round 2 in typical flow) performs automated screening of applications to identify eligible projects, detect duplicates, and flag edge cases for admin review. It replaces the current `FILTER` stage with enhanced features: rule-based filtering, AI-powered screening, duplicate detection, and a manual override system.

### Purpose

1. **Automated Eligibility Checks** — Field-based rules (age, category, country, etc.) and document checks (required files)
2. **AI Screening** — GPT-powered rubric evaluation with confidence banding
3. **Duplicate Detection** — Cross-application similarity checking to catch multiple submissions from same applicant
4. **Manual Review Queue** — Flagged projects go to admin dashboard for final decision
5. **Admin Override** — Any automated decision can be manually reversed with audit trail

### Key Features

| Feature | Description |
|---------|-------------|
| **Multi-Rule Engine** | Field-based, document-check, and AI rules run in sequence |
| **Confidence Banding** | AI results split into auto-pass (high), manual-review (medium), auto-reject (low) |
| **Duplicate Detection** | Built-in email-based duplicate check (always flags for review) |
| **Manual Override** | Admin can approve/reject any flagged project with reason |
| **Batch Processing** | AI screening runs in configurable batches for performance |
| **Progress Tracking** | FilteringJob model tracks long-running jobs |
| **Audit Trail** | All decisions logged in DecisionAuditLog |

---

## Current System

### Stage Model

```prisma
model Stage {
  id            String      @id
  trackId       String
  stageType     StageType   // FILTER
  name          String
  slug          String
  status        StageStatus
  sortOrder     Int
  configJson    Json?       // Generic blob — hard to know what's configurable
  windowOpenAt  DateTime?
  windowCloseAt DateTime?

  filteringRules   FilteringRule[]
  filteringResults FilteringResult[]
  filteringJobs    FilteringJob[]
}

enum StageType {
  INTAKE
  FILTER        // <-- Current filtering stage
  EVALUATION
  SELECTION
  LIVE_FINAL
  RESULTS
}
```

### FilteringRule Model

```prisma
model FilteringRule {
  id         String            @id
  stageId    String
  name       String
  ruleType   FilteringRuleType
  configJson Json              @db.JsonB  // Type-specific config
  priority   Int               @default(0)
  isActive   Boolean           @default(true)

  stage Stage @relation(fields: [stageId], references: [id], onDelete: Cascade)
}

enum FilteringRuleType {
  FIELD_BASED      // Field checks (category, country, age, etc.)
  DOCUMENT_CHECK   // File existence/type checks
  AI_SCREENING     // GPT rubric evaluation
}
```

**Rule configJson shapes:**

```typescript
// FIELD_BASED
{
  conditions: [
    { field: "competitionCategory", operator: "equals", value: "STARTUP" },
    { field: "foundedAt", operator: "older_than_years", value: 5 }
  ],
  logic: "AND" | "OR",
  action: "PASS" | "REJECT" | "FLAG"
}

// DOCUMENT_CHECK
{
  requiredFileTypes: ["pdf", "docx"],
  minFileCount: 2,
  action: "FLAG"
}

// AI_SCREENING
{
  criteriaText: "Project must demonstrate clear ocean conservation impact",
  action: "FLAG",
  batchSize: 20,
  parallelBatches: 1
}
```

### FilteringResult Model

```prisma
model FilteringResult {
  id              String           @id
  stageId         String
  projectId       String
  outcome         FilteringOutcome // PASSED | FILTERED_OUT | FLAGGED
  ruleResultsJson Json?            @db.JsonB  // Per-rule results
  aiScreeningJson Json?            @db.JsonB  // AI screening details

  // Admin override
  overriddenBy   String?
  overriddenAt   DateTime?
  overrideReason String?           @db.Text
  finalOutcome   FilteringOutcome?

  stage            Stage   @relation(fields: [stageId], references: [id])
  project          Project @relation(fields: [projectId], references: [id])
  overriddenByUser User?   @relation("FilteringOverriddenBy", fields: [overriddenBy], references: [id])

  @@unique([stageId, projectId])
}

enum FilteringOutcome {
  PASSED        // Auto-advance to next round
  FILTERED_OUT  // Auto-reject
  FLAGGED       // Manual review required
}
```

### FilteringJob Model

```prisma
model FilteringJob {
  id             String             @id
  stageId        String
  status         FilteringJobStatus @default(PENDING)
  totalProjects  Int                @default(0)
  totalBatches   Int                @default(0)
  currentBatch   Int                @default(0)
  processedCount Int                @default(0)
  passedCount    Int                @default(0)
  filteredCount  Int                @default(0)
  flaggedCount   Int                @default(0)
  errorMessage   String?            @db.Text
  startedAt      DateTime?
  completedAt    DateTime?

  stage Stage @relation(fields: [stageId], references: [id])
}

enum FilteringJobStatus {
  PENDING
  RUNNING
  COMPLETED
  FAILED
}
```

### AI Screening Flow

```typescript
// src/server/services/ai-filtering.ts
export async function executeAIScreening(
  config: AIScreeningConfig,
  projects: ProjectForFiltering[],
  userId?: string,
  entityId?: string,
  onProgress?: ProgressCallback
): Promise<Map<string, AIScreeningResult>>
```

**AI Screening Steps:**

1. **Anonymization** — Strip PII before sending to OpenAI (see `anonymization.ts`)
2. **Batch Processing** — Group projects into configurable batch sizes (default 20)
3. **GPT Evaluation** — Send to OpenAI with rubric criteria
4. **Result Parsing** — Parse JSON response with confidence scores
5. **Confidence Banding** — Split into auto-pass/manual-review/auto-reject buckets
6. **Logging** — Track token usage in AIUsageLog

**Confidence Thresholds:**

```typescript
const AI_CONFIDENCE_THRESHOLD_PASS = 0.75    // Auto-pass if ≥ 0.75 and meetsAllCriteria
const AI_CONFIDENCE_THRESHOLD_REJECT = 0.25  // Auto-reject if ≤ 0.25 and !meetsAllCriteria
// Between 0.25-0.75 → FLAGGED for manual review
```

### Duplicate Detection

```typescript
// Current implementation in stage-filtering.ts (lines 264-289)
// Groups projects by submittedByEmail to detect duplicates
// Duplicates are ALWAYS flagged (never auto-rejected)

const duplicateProjectIds = new Set<string>()
const emailToProjects = new Map<string, Array<{ id: string; title: string }>>()

for (const project of projects) {
  const email = (project.submittedByEmail ?? '').toLowerCase().trim()
  if (!email) continue
  if (!emailToProjects.has(email)) emailToProjects.set(email, [])
  emailToProjects.get(email)!.push({ id: project.id, title: project.title })
}

// If any email has > 1 project, all siblings are flagged
emailToProjects.forEach((group, _email) => {
  if (group.length <= 1) return
  for (const p of group) {
    duplicateProjectIds.add(p.id)
  }
})
```

**Duplicate Metadata Stored:**

```json
{
  "isDuplicate": true,
  "siblingProjectIds": ["proj-2", "proj-3"],
  "duplicateNote": "This project shares a submitter email with 2 other project(s)."
}
```

### Filtering Execution Flow

```typescript
// src/server/services/stage-filtering.ts
export async function runStageFiltering(
  stageId: string,
  actorId: string,
  prisma: PrismaClient
): Promise<StageFilteringResult>
```

**Execution Steps:**

1. Load all projects in PENDING/IN_PROGRESS state for this stage
2. Create FilteringJob for progress tracking
3. Load active FilteringRule records (ordered by priority)
4. **Run duplicate detection** (built-in, always runs first)
5. **Run deterministic rules** (FIELD_BASED, DOCUMENT_CHECK)
   - If any REJECT rule fails → outcome = FILTERED_OUT
   - If any FLAG rule fails → outcome = FLAGGED
6. **Run AI screening** (if enabled and deterministic passed OR if duplicate)
   - Batch process with configurable size
   - Band by confidence
   - Attach duplicate metadata
7. **Save FilteringResult** for each project
8. Update FilteringJob counts (passed/rejected/flagged)
9. Log decision audit

---

## Redesigned Filtering Round

### Round Model Changes

```prisma
model Round {
  id              String      @id @default(cuid())
  competitionId   String
  name            String       // "AI Screening & Eligibility Check"
  slug            String       // "filtering"
  roundType       RoundType    // FILTERING (renamed from FILTER)
  status          RoundStatus  @default(ROUND_DRAFT)
  sortOrder       Int          @default(0)

  // Time windows
  windowOpenAt    DateTime?
  windowCloseAt   DateTime?

  // Round-type-specific configuration (validated by Zod)
  configJson      Json?        @db.JsonB

  // Relations
  competition          Competition           @relation(fields: [competitionId], references: [id])
  projectRoundStates   ProjectRoundState[]
  filteringRules       FilteringRule[]
  filteringResults     FilteringResult[]
  filteringJobs        FilteringJob[]
  advancementRules     AdvancementRule[]
}

enum RoundType {
  INTAKE
  FILTERING      // Renamed from FILTER for clarity
  EVALUATION
  SUBMISSION     // New: multi-round submissions
  MENTORING      // New: mentor workspace
  LIVE_FINAL
  CONFIRMATION   // New: winner agreement
}
```

### FilteringConfig Type (Zod-Validated)

```typescript
// src/types/round-configs.ts
export type FilteringConfig = {
  // Rule engine
  rules: FilterRuleDef[]  // Configured rules (can be empty to skip deterministic filtering)

  // AI screening
  aiScreeningEnabled: boolean
  aiRubricPrompt: string   // Custom rubric for AI (plain-language criteria)
  aiConfidenceThresholds: {
    high: number    // Above this = auto-pass (default: 0.85)
    medium: number  // Above this = flag for review (default: 0.6)
    low: number     // Below this = auto-reject (default: 0.4)
  }
  aiBatchSize: number          // Projects per AI batch (default: 20, max: 50)
  aiParallelBatches: number    // Concurrent batches (default: 1, max: 10)

  // Duplicate detection
  duplicateDetectionEnabled: boolean
  duplicateThreshold: number   // Email similarity threshold (0-1, default: 1.0 = exact match)
  duplicateAction: 'FLAG' | 'AUTO_REJECT'  // Default: FLAG (always recommend FLAG)

  // Advancement behavior
  autoAdvancePassingProjects: boolean  // Auto-advance PASSED projects to next round
  manualReviewRequired: boolean        // All results require admin approval before advance

  // Eligibility criteria (structured)
  eligibilityCriteria: EligibilityCriteria[]

  // Category-specific rules
  categorySpecificRules: {
    STARTUP?: CategoryRuleSet
    BUSINESS_CONCEPT?: CategoryRuleSet
  }
}

export type FilterRuleDef = {
  id?: string  // Optional — for editing existing rules
  name: string
  ruleType: 'FIELD_CHECK' | 'DOCUMENT_CHECK' | 'AI_SCORE' | 'DUPLICATE' | 'CUSTOM'
  config: FilterRuleConfig
  priority: number  // Lower = run first
  isActive: boolean
  action: 'PASS' | 'REJECT' | 'FLAG'
}

export type FilterRuleConfig =
  | FieldCheckConfig
  | DocumentCheckConfig
  | AIScoreConfig
  | CustomConfig

export type FieldCheckConfig = {
  conditions: FieldCondition[]
  logic: 'AND' | 'OR'
}

export type FieldCondition = {
  field: 'competitionCategory' | 'foundedAt' | 'country' | 'geographicZone' | 'tags' | 'oceanIssue' | 'wantsMentorship' | 'institution'
  operator: 'equals' | 'not_equals' | 'contains' | 'in' | 'not_in' | 'is_empty' | 'greater_than' | 'less_than' | 'older_than_years' | 'newer_than_years'
  value: string | number | string[] | boolean
}

export type DocumentCheckConfig = {
  requiredFileTypes?: string[]  // e.g., ['pdf', 'docx']
  minFileCount?: number
  maxFileCount?: number
  minTotalSizeMB?: number
  maxTotalSizeMB?: number
}

export type AIScoreConfig = {
  criteriaText: string  // Plain-language rubric
  minScore: number      // Minimum AI score to pass (0-10)
  weightInOverall: number  // Weight if combining multiple AI rules (0-1)
}

export type CustomConfig = {
  // For future extension — custom JS/Python evaluation
  scriptUrl?: string
  functionName?: string
  parameters?: Record<string, unknown>
}

export type EligibilityCriteria = {
  name: string
  description: string
  required: boolean
  checkType: 'field' | 'document' | 'ai' | 'custom'
  checkConfig: FilterRuleConfig
}

export type CategoryRuleSet = {
  minAge?: number  // Years since founded
  maxAge?: number
  requiredTags?: string[]
  excludedCountries?: string[]
  requiredDocuments?: string[]
}
```

### Zod Schema for FilteringConfig

```typescript
// src/lib/round-config-schemas.ts
import { z } from 'zod'

export const FieldConditionSchema = z.object({
  field: z.enum([
    'competitionCategory',
    'foundedAt',
    'country',
    'geographicZone',
    'tags',
    'oceanIssue',
    'wantsMentorship',
    'institution'
  ]),
  operator: z.enum([
    'equals',
    'not_equals',
    'contains',
    'in',
    'not_in',
    'is_empty',
    'greater_than',
    'less_than',
    'older_than_years',
    'newer_than_years'
  ]),
  value: z.union([
    z.string(),
    z.number(),
    z.array(z.string()),
    z.boolean()
  ])
})

export const FieldCheckConfigSchema = z.object({
  conditions: z.array(FieldConditionSchema),
  logic: z.enum(['AND', 'OR'])
})

export const DocumentCheckConfigSchema = z.object({
  requiredFileTypes: z.array(z.string()).optional(),
  minFileCount: z.number().int().min(0).optional(),
  maxFileCount: z.number().int().min(0).optional(),
  minTotalSizeMB: z.number().min(0).optional(),
  maxTotalSizeMB: z.number().min(0).optional()
})

export const AIScoreConfigSchema = z.object({
  criteriaText: z.string().min(10).max(5000),
  minScore: z.number().min(0).max(10),
  weightInOverall: z.number().min(0).max(1).default(1.0)
})

export const CustomConfigSchema = z.object({
  scriptUrl: z.string().url().optional(),
  functionName: z.string().optional(),
  parameters: z.record(z.unknown()).optional()
})

export const FilterRuleDefSchema = z.object({
  id: z.string().optional(),
  name: z.string().min(1).max(255),
  ruleType: z.enum(['FIELD_CHECK', 'DOCUMENT_CHECK', 'AI_SCORE', 'DUPLICATE', 'CUSTOM']),
  config: z.union([
    FieldCheckConfigSchema,
    DocumentCheckConfigSchema,
    AIScoreConfigSchema,
    CustomConfigSchema
  ]),
  priority: z.number().int().min(0).default(0),
  isActive: z.boolean().default(true),
  action: z.enum(['PASS', 'REJECT', 'FLAG'])
})

export const CategoryRuleSetSchema = z.object({
  minAge: z.number().int().min(0).optional(),
  maxAge: z.number().int().min(0).optional(),
  requiredTags: z.array(z.string()).optional(),
  excludedCountries: z.array(z.string()).optional(),
  requiredDocuments: z.array(z.string()).optional()
})

export const FilteringConfigSchema = z.object({
  rules: z.array(FilterRuleDefSchema).default([]),

  aiScreeningEnabled: z.boolean().default(false),
  aiRubricPrompt: z.string().min(0).max(10000).default(''),
  aiConfidenceThresholds: z.object({
    high: z.number().min(0).max(1).default(0.85),
    medium: z.number().min(0).max(1).default(0.6),
    low: z.number().min(0).max(1).default(0.4)
  }).default({ high: 0.85, medium: 0.6, low: 0.4 }),
  aiBatchSize: z.number().int().min(1).max(50).default(20),
  aiParallelBatches: z.number().int().min(1).max(10).default(1),

  duplicateDetectionEnabled: z.boolean().default(true),
  duplicateThreshold: z.number().min(0).max(1).default(1.0),
  duplicateAction: z.enum(['FLAG', 'AUTO_REJECT']).default('FLAG'),

  autoAdvancePassingProjects: z.boolean().default(false),
  manualReviewRequired: z.boolean().default(true),

  eligibilityCriteria: z.array(z.object({
    name: z.string(),
    description: z.string(),
    required: z.boolean(),
    checkType: z.enum(['field', 'document', 'ai', 'custom']),
    checkConfig: z.union([
      FieldCheckConfigSchema,
      DocumentCheckConfigSchema,
      AIScoreConfigSchema,
      CustomConfigSchema
    ])
  })).default([]),

  categorySpecificRules: z.object({
    STARTUP: CategoryRuleSetSchema.optional(),
    BUSINESS_CONCEPT: CategoryRuleSetSchema.optional()
  }).default({})
})

export type FilteringConfig = z.infer<typeof FilteringConfigSchema>
```

---

## Filtering Rule Engine

### Rule Evaluation Order

```
1. Built-in Duplicate Detection (if enabled)
   ↓
2. FIELD_CHECK rules (sorted by priority ascending)
   ↓
3. DOCUMENT_CHECK rules (sorted by priority ascending)
   ↓
4. AI_SCORE rules (if aiScreeningEnabled) — batch processed
   ↓
5. CUSTOM rules (future extension)
   ↓
6. Determine final outcome: PASSED | FILTERED_OUT | FLAGGED
```

### Rule Types in Detail

#### 1. FIELD_CHECK

**Purpose:** Validate project metadata fields against conditions.

**Operators:**

| Operator | Description | Example |
|----------|-------------|---------|
| `equals` | Field equals value | `competitionCategory equals "STARTUP"` |
| `not_equals` | Field does not equal value | `country not_equals "France"` |
| `contains` | Field contains substring (case-insensitive) | `tags contains "conservation"` |
| `in` | Field value is in array | `country in ["Monaco", "France", "Italy"]` |
| `not_in` | Field value not in array | `oceanIssue not_in ["OTHER"]` |
| `is_empty` | Field is null, empty string, or empty array | `institution is_empty` |
| `greater_than` | Numeric comparison | `teamMemberCount greater_than 2` |
| `less_than` | Numeric comparison | `fundingGoal less_than 100000` |
| `older_than_years` | Date comparison (foundedAt) | `foundedAt older_than_years 5` |
| `newer_than_years` | Date comparison (foundedAt) | `foundedAt newer_than_years 2` |

**Example Rule:**

```json
{
  "name": "Startups Must Be < 5 Years Old",
  "ruleType": "FIELD_CHECK",
  "config": {
    "conditions": [
      { "field": "competitionCategory", "operator": "equals", "value": "STARTUP" },
      { "field": "foundedAt", "operator": "newer_than_years", "value": 5 }
    ],
    "logic": "AND"
  },
  "priority": 10,
  "isActive": true,
  "action": "REJECT"
}
```

**Logic:**
- `AND`: All conditions must be true
- `OR`: At least one condition must be true

**Action:**
- `PASS`: If conditions met, mark as passed (continue to next rule)
- `REJECT`: If conditions met, auto-reject (short-circuit)
- `FLAG`: If conditions met, flag for manual review

#### 2. DOCUMENT_CHECK

**Purpose:** Verify file uploads meet requirements.

**Checks:**

```typescript
type DocumentCheckConfig = {
  requiredFileTypes?: string[]  // e.g., ['pdf', 'docx'] — must have at least one of each
  minFileCount?: number         // Minimum number of files
  maxFileCount?: number         // Maximum number of files
  minTotalSizeMB?: number       // Minimum total upload size
  maxTotalSizeMB?: number       // Maximum total upload size
}
```

**Example Rule:**

```json
{
  "name": "Must Upload Executive Summary + Business Plan",
  "ruleType": "DOCUMENT_CHECK",
  "config": {
    "requiredFileTypes": ["pdf"],
    "minFileCount": 2
  },
  "priority": 20,
  "isActive": true,
  "action": "FLAG"
}
```

#### 3. AI_SCORE

**Purpose:** GPT-powered rubric evaluation.

**Config:**

```typescript
type AIScoreConfig = {
  criteriaText: string         // Plain-language rubric
  minScore: number             // Minimum score to pass (0-10)
  weightInOverall: number      // Weight if combining multiple AI rules
}
```

**Example Rule:**

```json
{
  "name": "AI: Ocean Impact Assessment",
  "ruleType": "AI_SCORE",
  "config": {
    "criteriaText": "Project must demonstrate measurable ocean conservation impact with clear metrics and realistic timeline. Reject spam or unrelated projects.",
    "minScore": 6.0,
    "weightInOverall": 1.0
  },
  "priority": 30,
  "isActive": true,
  "action": "FLAG"
}
```

**AI Evaluation Flow:**

1. Anonymize project data (strip PII)
2. Batch projects (configurable batch size)
3. Send to OpenAI with rubric
4. Parse response:
   ```json
   {
     "projects": [
       {
         "project_id": "anon-123",
         "meets_criteria": true,
         "confidence": 0.82,
         "reasoning": "Clear ocean conservation focus, realistic metrics",
         "quality_score": 7,
         "spam_risk": false
       }
     ]
   }
   ```
5. Band by confidence thresholds
6. Store in `aiScreeningJson` on FilteringResult

#### 4. DUPLICATE

**Purpose:** Detect multiple submissions from same applicant.

**Built-in Rule:**
- Always runs first if `duplicateDetectionEnabled: true`
- Groups projects by `submittedByEmail`
- Flags all projects in duplicate groups
- Never auto-rejects duplicates (admin must decide which to keep)

**Duplicate Metadata:**

```json
{
  "isDuplicate": true,
  "siblingProjectIds": ["proj-2", "proj-3"],
  "duplicateNote": "This project shares a submitter email with 2 other project(s). Admin must review and decide which to keep.",
  "similarityScore": 1.0
}
```

**Future Enhancement: Semantic Similarity**

```typescript
duplicateThreshold: number  // 0-1 (e.g., 0.8 = 80% similar text triggers duplicate flag)
```

Use text embeddings to detect duplicates beyond exact email match (compare titles, descriptions).

#### 5. CUSTOM (Future Extension)

**Purpose:** Run custom evaluation scripts (JS/Python).

**Config:**

```typescript
type CustomConfig = {
  scriptUrl?: string         // URL to hosted script
  functionName?: string      // Function to call
  parameters?: Record<string, unknown>
}
```

**Example Use Case:**
- External API call to verify company registration
- Custom formula combining multiple fields
- Integration with third-party data sources

---

## Rule Combination Logic

### How Rules Are Combined

```typescript
// Pseudocode for rule evaluation
let finalOutcome: 'PASSED' | 'FILTERED_OUT' | 'FLAGGED' = 'PASSED'
let hasFailed = false
let hasFlagged = false

// Run rules in priority order
for (const rule of rules.sort((a, b) => a.priority - b.priority)) {
  const result = evaluateRule(rule, project)

  if (!result.passed) {
    if (rule.action === 'REJECT') {
      hasFailed = true
      break  // Short-circuit — no need to run remaining rules
    } else if (rule.action === 'FLAG') {
      hasFlagged = true
      // Continue to next rule
    }
  }
}

// Determine final outcome
if (hasFailed) {
  finalOutcome = 'FILTERED_OUT'
} else if (hasFlagged) {
  finalOutcome = 'FLAGGED'
} else {
  finalOutcome = 'PASSED'
}

// Override: Duplicates always flagged (never auto-rejected)
if (isDuplicate && finalOutcome === 'FILTERED_OUT') {
  finalOutcome = 'FLAGGED'
}
```

### Weighted Scoring (Advanced)

For multiple AI rules or field checks, admins can configure weighted scoring:

```typescript
type WeightedScoringConfig = {
  enabled: boolean
  rules: Array<{
    ruleId: string
    weight: number  // 0-1
  }>
  passingThreshold: number  // Combined weighted score needed to pass (0-10)
}
```

**Example:**

```json
{
  "enabled": true,
  "rules": [
    { "ruleId": "ai-ocean-impact", "weight": 0.6 },
    { "ruleId": "ai-innovation-score", "weight": 0.4 }
  ],
  "passingThreshold": 7.0
}
```

Combined score = (7.5 × 0.6) + (8.0 × 0.4) = 4.5 + 3.2 = 7.7 → PASSED

---

## AI Screening Pipeline

### Step-by-Step Flow

```
1. Load Projects
   ↓
2. Anonymize Data (strip PII)
   ↓
3. Batch Projects (configurable size: 1-50, default 20)
   ↓
4. Parallel Processing (configurable: 1-10 concurrent batches)
   ↓
5. OpenAI API Call (GPT-4o or configured model)
   ↓
6. Parse JSON Response
   ↓
7. Map Anonymous IDs → Real Project IDs
   ↓
8. Band by Confidence Threshold
   ↓
9. Store Results in FilteringResult
   ↓
10. Log Token Usage (AIUsageLog)
```

### Anonymization

```typescript
// src/server/services/anonymization.ts
export function anonymizeProjectsForAI(
  projects: ProjectWithRelations[],
  purpose: 'FILTERING' | 'ASSIGNMENT' | 'SUMMARY'
): { anonymized: AnonymizedProjectForAI[]; mappings: ProjectAIMapping[] }
```

**What's Stripped:**
- Team member names
- Submitter email
- Submitter name
- Personal identifiers in metadata
- File paths (only file types retained)

**What's Kept:**
- Project title (if generic)
- Description
- Category (STARTUP/BUSINESS_CONCEPT)
- Country
- Tags
- Ocean issue
- Founded date (year only)

**Validation:**

```typescript
export function validateAnonymizedProjects(
  anonymized: AnonymizedProjectForAI[]
): boolean
```

Checks for PII patterns:
- Email addresses (`/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b/i`)
- Phone numbers
- Full names (heuristic)
- URLs with query params

**GDPR Compliance:**
- All AI calls must pass `validateAnonymizedProjects()` check
- Fails if PII detected → throws error, logs, flags all projects for manual review

### OpenAI Prompt Structure

**System Prompt:**

```
Project screening assistant. Evaluate against criteria, return JSON.
Format: {"projects": [{project_id, meets_criteria: bool, confidence: 0-1, reasoning: str, quality_score: 1-10, spam_risk: bool}]}
Be objective. Base evaluation only on provided data. No personal identifiers in reasoning.
```

**User Prompt:**

```
CRITERIA: {aiRubricPrompt}
PROJECTS: [{project_id, title, description, category, tags, ...}]
Evaluate and return JSON.
```

**Response Format:**

```json
{
  "projects": [
    {
      "project_id": "anon-001",
      "meets_criteria": true,
      "confidence": 0.82,
      "reasoning": "Clear ocean conservation focus with measurable impact metrics. Realistic timeline. Strong innovation.",
      "quality_score": 8,
      "spam_risk": false
    },
    {
      "project_id": "anon-002",
      "meets_criteria": false,
      "confidence": 0.91,
      "reasoning": "Generic description, no specific ocean impact. Appears to be spam or off-topic.",
      "quality_score": 2,
      "spam_risk": true
    }
  ]
}
```

### Confidence Banding

```typescript
function bandByConfidence(
  aiScreeningData: { confidence: number; meetsAllCriteria: boolean }
): { outcome: 'PASSED' | 'FILTERED_OUT' | 'FLAGGED'; confidence: number }
```

**Default Thresholds:**

| Confidence | Meets Criteria | Outcome | Action |
|------------|----------------|---------|--------|
| ≥ 0.85 | true | PASSED | Auto-advance |
| 0.60-0.84 | true | FLAGGED | Manual review |
| 0.40-0.59 | any | FLAGGED | Manual review |
| ≤ 0.39 | false | FILTERED_OUT | Auto-reject |

**Admin Override:**

Admins can customize thresholds in `FilteringConfig.aiConfidenceThresholds`.

---

## Duplicate Detection

### Current Implementation

```typescript
// Built-in email-based duplicate detection
const emailToProjects = new Map<string, Array<{ id: string; title: string }>>()

for (const project of projects) {
  const email = (project.submittedByEmail ?? '').toLowerCase().trim()
  if (!email) continue
  if (!emailToProjects.has(email)) emailToProjects.set(email, [])
  emailToProjects.get(email)!.push({ id: project.id, title: project.title })
}

// Flag all projects in groups of size > 1
emailToProjects.forEach((group) => {
  if (group.length <= 1) return
  for (const p of group) {
    duplicateProjectIds.add(p.id)
  }
})
```

### Enhanced Detection (Future)

**Text Similarity:**

```typescript
import { cosineSimilarity } from '@/lib/text-similarity'

function detectDuplicatesByText(
  projects: Project[],
  threshold: number = 0.8
): Set<string>
```

**Algorithm:**
1. Generate text embeddings for title + description
2. Compute pairwise cosine similarity
3. Flag projects with similarity ≥ threshold
4. Group into duplicate clusters

**Example:**

Project A: "Ocean cleanup robot using AI"
Project B: "AI-powered ocean cleaning robot"
Similarity: 0.92 → Flagged as duplicates

### Duplicate Metadata

```json
{
  "isDuplicate": true,
  "siblingProjectIds": ["proj-2", "proj-3"],
  "duplicateNote": "This project shares a submitter email with 2 other project(s). Admin must review and decide which to keep.",
  "similarityScore": 1.0,
  "detectionMethod": "email" | "text_similarity"
}
```

### Admin Duplicate Review UI

```
┌─────────────────────────────────────────────────────────────┐
│ Duplicate Group: applicant@example.com                      │
│                                                              │
│ ┌──────────────────────────────────────────────────────┐   │
│ │ Project 1: "Ocean Cleanup Robot"                      │   │
│ │ Submitted: 2026-02-01 10:30 AM                        │   │
│ │ Category: STARTUP                                     │   │
│ │ AI Score: 7.5/10                                      │   │
│ │                                                        │   │
│ │ [✓ Keep This]  [✗ Reject]  [View Details]            │   │
│ └──────────────────────────────────────────────────────┘   │
│                                                              │
│ ┌──────────────────────────────────────────────────────┐   │
│ │ Project 2: "AI-Powered Ocean Cleaner"                 │   │
│ │ Submitted: 2026-02-05 2:15 PM                         │   │
│ │ Category: STARTUP                                     │   │
│ │ AI Score: 6.8/10                                      │   │
│ │                                                        │   │
│ │ [✓ Keep This]  [✗ Reject]  [View Details]            │   │
│ └──────────────────────────────────────────────────────┘   │
│                                                              │
│ Recommendation: Keep Project 1 (higher AI score, earlier    │
│ submission)                                                  │
│                                                              │
│ [Approve Recommendation]  [Manual Decision]                 │
└─────────────────────────────────────────────────────────────┘
```

---

## Admin Experience

### Filtering Dashboard

```
┌───────────────────────────────────────────────────────────────────┐
│ Round 2: AI Screening & Eligibility Check                         │
│                                                                    │
│ Status: Completed ● Last Run: 2026-02-10 3:45 PM                  │
│                                                                    │
│ ┌─────────────────────────────────────────────────────────────┐  │
│ │ Results Summary                                              │  │
│ │                                                              │  │
│ │  ✓ Passed:         142 projects (auto-advance enabled)      │  │
│ │  ✗ Filtered Out:    28 projects                             │  │
│ │  ⚠ Flagged:         15 projects (manual review required)    │  │
│ │  ────────────────────────────────────────────────────────   │  │
│ │    Total:          185 projects processed                   │  │
│ │                                                              │  │
│ │  AI Usage:         12,450 tokens ($0.15)                    │  │
│ │  Processing Time:  2m 34s                                   │  │
│ └─────────────────────────────────────────────────────────────┘  │
│                                                                    │
│ ┌─────────────────────────────────────────────────────────────┐  │
│ │ Manual Review Queue (15)                        [Sort ▼]     │  │
│ │                                                              │  │
│ │ ┌──────────────────────────────────────────────────────┐   │  │
│ │ │ ⚠ Ocean Cleanup Initiative                           │   │  │
│ │ │ Category: STARTUP                                    │   │  │
│ │ │ Reason: Duplicate submission (2 projects)            │   │  │
│ │ │ AI Score: 7.2/10 (confidence: 0.65)                  │   │  │
│ │ │                                                       │   │  │
│ │ │ Failed Rules:                                        │   │  │
│ │ │  • Duplicate Detection: EMAIL_MATCH                  │   │  │
│ │ │                                                       │   │  │
│ │ │ [View Details]  [✓ Approve]  [✗ Reject]            │   │  │
│ │ └──────────────────────────────────────────────────────┘   │  │
│ │                                                              │  │
│ │ ┌──────────────────────────────────────────────────────┐   │  │
│ │ │ ⚠ Blue Carbon Project                                │   │  │
│ │ │ Category: BUSINESS_CONCEPT                           │   │  │
│ │ │ Reason: AI confidence medium (0.58)                  │   │  │
│ │ │ AI Score: 5.5/10                                     │   │  │
│ │ │                                                       │   │  │
│ │ │ AI Reasoning: "Project description is vague and      │   │  │
│ │ │ lacks specific impact metrics. Needs clarification." │   │  │
│ │ │                                                       │   │  │
│ │ │ [View Details]  [✓ Approve]  [✗ Reject]            │   │  │
│ │ └──────────────────────────────────────────────────────┘   │  │
│ │                                                              │  │
│ │ ... 13 more flagged projects                                │  │
│ │                                                              │  │
│ │ [Batch Approve All]  [Export Queue]                         │  │
│ └─────────────────────────────────────────────────────────────┘  │
│                                                                    │
│ [Re-run Filtering]  [Configure Rules]  [View Logs]               │
└───────────────────────────────────────────────────────────────────┘
```

### Rule Configuration UI

```
┌───────────────────────────────────────────────────────────────────┐
│ Filtering Rules Configuration                                     │
│                                                                    │
│ ┌─────────────────────────────────────────────────────────────┐  │
│ │ Active Rules (5)                            [+ Add Rule]     │  │
│ │                                                              │  │
│ │ ┌──────────────────────────────────────────────────────┐   │  │
│ │ │ ≡ Rule 1: Startups Must Be < 5 Years Old             │   │  │
│ │ │   Type: FIELD_CHECK                                  │   │  │
│ │ │   Action: REJECT                                     │   │  │
│ │ │   Priority: 10                          [Edit] [✗]   │   │  │
│ │ └──────────────────────────────────────────────────────┘   │  │
│ │                                                              │  │
│ │ ┌──────────────────────────────────────────────────────┐   │  │
│ │ │ ≡ Rule 2: Must Upload Executive Summary              │   │  │
│ │ │   Type: DOCUMENT_CHECK                               │   │  │
│ │ │   Action: FLAG                                       │   │  │
│ │ │   Priority: 20                          [Edit] [✗]   │   │  │
│ │ └──────────────────────────────────────────────────────┘   │  │
│ │                                                              │  │
│ │ ┌──────────────────────────────────────────────────────┐   │  │
│ │ │ ≡ Rule 3: AI Ocean Impact Assessment                 │   │  │
│ │ │   Type: AI_SCORE                                     │   │  │
│ │ │   Action: FLAG                                       │   │  │
│ │ │   Priority: 30                          [Edit] [✗]   │   │  │
│ │ │   Rubric: "Project must demonstrate measurable..."   │   │  │
│ │ │   Min Score: 6.0/10                                  │   │  │
│ │ └──────────────────────────────────────────────────────┘   │  │
│ │                                                              │  │
│ │ ... 2 more rules                                             │  │
│ └─────────────────────────────────────────────────────────────┘  │
│                                                                    │
│ ┌─────────────────────────────────────────────────────────────┐  │
│ │ AI Settings                                                  │  │
│ │                                                              │  │
│ │ AI Screening:          [✓ Enabled]                           │  │
│ │ Batch Size:            [20] projects (1-50)                  │  │
│ │ Parallel Batches:      [2] (1-10)                            │  │
│ │                                                              │  │
│ │ Confidence Thresholds:                                       │  │
│ │  High (auto-pass):     [0.85]                                │  │
│ │  Medium (review):      [0.60]                                │  │
│ │  Low (auto-reject):    [0.40]                                │  │
│ └─────────────────────────────────────────────────────────────┘  │
│                                                                    │
│ ┌─────────────────────────────────────────────────────────────┐  │
│ │ Duplicate Detection                                          │  │
│ │                                                              │  │
│ │ Email-based:           [✓ Enabled]                           │  │
│ │ Text similarity:       [  ] Disabled (future)                │  │
│ │ Similarity threshold:  [0.80] (0-1)                          │  │
│ │ Action on duplicates:  [FLAG] (recommended)                  │  │
│ └─────────────────────────────────────────────────────────────┘  │
│                                                                    │
│ [Save Configuration]  [Test Rules]  [Cancel]                     │
└───────────────────────────────────────────────────────────────────┘
```

### Manual Override Controls

```
┌───────────────────────────────────────────────────────────────────┐
│ Manual Override: Ocean Cleanup Initiative                         │
│                                                                    │
│ Current Outcome:  ⚠ FLAGGED                                       │
│                                                                    │
│ ┌─────────────────────────────────────────────────────────────┐  │
│ │ Project Details                                              │  │
│ │                                                              │  │
│ │ Title:        Ocean Cleanup Initiative                       │  │
│ │ Category:     STARTUP                                        │  │
│ │ Submitted:    2026-02-01 10:30 AM                            │  │
│ │ Applicant:    applicant@example.com                          │  │
│ │                                                              │  │
│ │ Description:  [View Full Description]                        │  │
│ │ Files:        executive-summary.pdf, business-plan.docx      │  │
│ └─────────────────────────────────────────────────────────────┘  │
│                                                                    │
│ ┌─────────────────────────────────────────────────────────────┐  │
│ │ Filtering Results                                            │  │
│ │                                                              │  │
│ │ ✓ Rule 1: Startups < 5 Years Old                  PASSED    │  │
│ │ ✓ Rule 2: Upload Executive Summary                PASSED    │  │
│ │ ✗ Rule 3: Duplicate Detection                     FLAGGED   │  │
│ │   → Reason: 2 projects from applicant@example.com           │  │
│ │   → Sibling: "AI-Powered Ocean Cleaner" (proj-2)            │  │
│ │ ⚠ Rule 4: AI Ocean Impact                         FLAGGED   │  │
│ │   → AI Score: 7.2/10                                         │  │
│ │   → Confidence: 0.65 (medium)                                │  │
│ │   → Reasoning: "Clear ocean focus but needs more specific    │  │
│ │      impact metrics. Potential duplicate."                   │  │
│ └─────────────────────────────────────────────────────────────┘  │
│                                                                    │
│ ┌─────────────────────────────────────────────────────────────┐  │
│ │ Override Decision                                            │  │
│ │                                                              │  │
│ │ New Outcome: ○ Approve (PASSED)  ○ Reject (FILTERED_OUT)    │  │
│ │                                                              │  │
│ │ Reason (required):                                           │  │
│ │ ┌──────────────────────────────────────────────────────┐   │  │
│ │ │ Reviewed duplicate group — this is the stronger      │   │  │
│ │ │ submission. AI score above threshold. Approved to    │   │  │
│ │ │ advance.                                             │   │  │
│ │ └──────────────────────────────────────────────────────┘   │  │
│ │                                                              │  │
│ │ [Submit Override]  [Cancel]                                 │  │
│ └─────────────────────────────────────────────────────────────┘  │
└───────────────────────────────────────────────────────────────────┘
```

---

## API Changes

### New tRPC Procedures

```typescript
// src/server/routers/filtering.ts
export const filteringRouter = router({
  // Run filtering for a round
  runFiltering: adminProcedure
    .input(z.object({ roundId: z.string() }))
    .mutation(async ({ ctx, input }) => {
      return runStageFiltering(input.roundId, ctx.user.id, ctx.prisma)
    }),

  // Get filtering job status
  getJob: adminProcedure
    .input(z.object({ jobId: z.string() }))
    .query(async ({ ctx, input }) => {
      return ctx.prisma.filteringJob.findUnique({
        where: { id: input.jobId },
        include: { round: { select: { name: true } } }
      })
    }),

  // Get manual review queue
  getManualQueue: adminProcedure
    .input(z.object({ roundId: z.string() }))
    .query(async ({ ctx, input }) => {
      return getManualQueue(input.roundId, ctx.prisma)
    }),

  // Resolve manual decision
  resolveDecision: adminProcedure
    .input(z.object({
      filteringResultId: z.string(),
      outcome: z.enum(['PASSED', 'FILTERED_OUT']),
      reason: z.string().min(10).max(1000)
    }))
    .mutation(async ({ ctx, input }) => {
      return resolveManualDecision(
        input.filteringResultId,
        input.outcome,
        input.reason,
        ctx.user.id,
        ctx.prisma
      )
    }),

  // Batch override
  batchResolve: adminProcedure
    .input(z.object({
      filteringResultIds: z.array(z.string()),
      outcome: z.enum(['PASSED', 'FILTERED_OUT']),
      reason: z.string().min(10).max(1000)
    }))
    .mutation(async ({ ctx, input }) => {
      for (const id of input.filteringResultIds) {
        await resolveManualDecision(id, input.outcome, input.reason, ctx.user.id, ctx.prisma)
      }
    }),

  // Export results
  exportResults: adminProcedure
    .input(z.object({ roundId: z.string() }))
    .query(async ({ ctx, input }) => {
      // Return CSV-ready data
    }),

  // Configure filtering rules
  configureRules: adminProcedure
    .input(z.object({
      roundId: z.string(),
      rules: z.array(FilterRuleDefSchema)
    }))
    .mutation(async ({ ctx, input }) => {
      // Delete existing rules, create new ones
    }),

  // Update round config
  updateConfig: adminProcedure
    .input(z.object({
      roundId: z.string(),
      config: FilteringConfigSchema
    }))
    .mutation(async ({ ctx, input }) => {
      await ctx.prisma.round.update({
        where: { id: input.roundId },
        data: { configJson: input.config as any }
      })
    })
})
```

---

## Service Functions

### Core Service Signatures

```typescript
// src/server/services/round-filtering.ts

export async function runRoundFiltering(
  roundId: string,
  actorId: string,
  prisma: PrismaClient
): Promise<FilteringJobResult>

export async function getManualQueue(
  roundId: string,
  prisma: PrismaClient
): Promise<ManualQueueItem[]>

export async function resolveManualDecision(
  filteringResultId: string,
  outcome: 'PASSED' | 'FILTERED_OUT',
  reason: string,
  actorId: string,
  prisma: PrismaClient
): Promise<void>

export async function advanceFromFilteringRound(
  roundId: string,
  actorId: string,
  prisma: PrismaClient
): Promise<AdvancementResult>

type FilteringJobResult = {
  jobId: string
  total: number
  passed: number
  rejected: number
  flagged: number
  tokensUsed: number
  processingTime: number
}

type ManualQueueItem = {
  filteringResultId: string
  projectId: string
  projectTitle: string
  outcome: string
  ruleResults: RuleResult[]
  aiScreeningJson: Record<string, unknown> | null
  createdAt: Date
}

type AdvancementResult = {
  advancedCount: number
  targetRoundId: string
  targetRoundName: string
  notificationsSent: number
}
```

---

## Edge Cases

| Edge Case | Handling |
|-----------|----------|
| **No projects to filter** | FilteringJob completes immediately with 0 processed |
| **AI API failure** | Flag all projects for manual review, log error, continue |
| **Duplicate with different outcomes** | Always flag duplicates (never auto-reject) |
| **Admin overrides auto-rejected project** | Allowed — finalOutcome overrides outcome |
| **Project withdrawn during filtering** | Skip in filtering, mark WITHDRAWN in ProjectRoundState |
| **Rule misconfiguration** | Validate config on save, throw error if invalid |
| **All projects flagged** | Valid scenario — requires manual review for all |
| **All projects auto-rejected** | Valid scenario — no advancement |
| **Advancement before manual review** | Blocked if `manualReviewRequired: true` |
| **Re-run filtering** | Deletes previous FilteringResult records, runs fresh |
| **AI response parse error** | Flag affected projects, log error, continue |
| **Duplicate groups > 10 projects** | Flag all, recommend batch review in UI |
| **Missing submittedByEmail** | Skip duplicate detection for this project |
| **Empty rule set** | All projects auto-pass (useful for testing) |

---

## Integration Points

### Connects To: INTAKE Round (Input)

- **Input:** Projects in PENDING/IN_PROGRESS state from INTAKE round
- **Data:** Project metadata, submitted files, team member data
- **Trigger:** Admin manually runs filtering after INTAKE window closes

### Connects To: EVALUATION Round (Output)

- **Output:** Passing projects advance to EVALUATION round
- **Data:** FilteringResult metadata attached to projects (AI scores, flags)
- **Trigger:** Auto-advance if `autoAdvancePassingProjects: true`, else manual

### Connects To: AI Services

- **Service:** `src/server/services/ai-filtering.ts`
- **Purpose:** GPT-powered rubric evaluation
- **Data Flow:** Anonymized project data → OpenAI → parsed results → confidence banding

### Connects To: Audit System

- **Tables:** `DecisionAuditLog`, `OverrideAction`, `AuditLog`, `AIUsageLog`
- **Events:** `filtering.completed`, `filtering.manual_decision`, `filtering.auto_advanced`

---

## Summary

The redesigned FILTERING round provides:

1. **Flexible Rule Engine** — Field checks, document checks, AI scoring, duplicates, custom scripts
2. **AI-Powered Screening** — GPT rubric evaluation with confidence banding
3. **Built-in Duplicate Detection** — Email-based (future: text similarity)
4. **Manual Review Queue** — Admin override system with full audit trail
5. **Batch Processing** — Configurable batch sizes for performance
6. **Progress Tracking** — FilteringJob model for long-running operations
7. **Auto-Advancement** — Passing projects can auto-advance to next round
8. **Full Auditability** — All decisions logged in DecisionAuditLog + OverrideAction

This replaces the current `FILTER` stage with a fully-featured, production-ready filtering system that balances automation with human oversight.