MOPC-App/docs/claude-architecture-redesign/05-round-filtering.md

1439 lines
52 KiB
Markdown
Raw Permalink Normal View History

# Round Type: FILTERING — AI Screening & Eligibility
## Overview
The **FILTERING** round type (Round 2 in typical flow) performs automated screening of applications to identify eligible projects, detect duplicates, and flag edge cases for admin review. It replaces the current `FILTER` stage with enhanced features: rule-based filtering, AI-powered screening, duplicate detection, and a manual override system.
### Purpose
1. **Automated Eligibility Checks** — Field-based rules (age, category, country, etc.) and document checks (required files)
2. **AI Screening** — GPT-powered rubric evaluation with confidence banding
3. **Duplicate Detection** — Cross-application similarity checking to catch multiple submissions from same applicant
4. **Manual Review Queue** — Flagged projects go to admin dashboard for final decision
5. **Admin Override** — Any automated decision can be manually reversed with audit trail
### Key Features
| Feature | Description |
|---------|-------------|
| **Multi-Rule Engine** | Field-based, document-check, and AI rules run in sequence |
| **Confidence Banding** | AI results split into auto-pass (high), manual-review (medium), auto-reject (low) |
| **Duplicate Detection** | Built-in email-based duplicate check (always flags for review) |
| **Manual Override** | Admin can approve/reject any flagged project with reason |
| **Batch Processing** | AI screening runs in configurable batches for performance |
| **Progress Tracking** | FilteringJob model tracks long-running jobs |
| **Audit Trail** | All decisions logged in DecisionAuditLog |
---
## Current System
### Stage Model
```prisma
model Stage {
id String @id
trackId String
stageType StageType // FILTER
name String
slug String
status StageStatus
sortOrder Int
configJson Json? // Generic blob — hard to know what's configurable
windowOpenAt DateTime?
windowCloseAt DateTime?
filteringRules FilteringRule[]
filteringResults FilteringResult[]
filteringJobs FilteringJob[]
}
enum StageType {
INTAKE
FILTER // <-- Current filtering stage
EVALUATION
SELECTION
LIVE_FINAL
RESULTS
}
```
### FilteringRule Model
```prisma
model FilteringRule {
id String @id
stageId String
name String
ruleType FilteringRuleType
configJson Json @db.JsonB // Type-specific config
priority Int @default(0)
isActive Boolean @default(true)
stage Stage @relation(fields: [stageId], references: [id], onDelete: Cascade)
}
enum FilteringRuleType {
FIELD_BASED // Field checks (category, country, age, etc.)
DOCUMENT_CHECK // File existence/type checks
AI_SCREENING // GPT rubric evaluation
}
```
**Rule configJson shapes:**
```typescript
// FIELD_BASED
{
conditions: [
{ field: "competitionCategory", operator: "equals", value: "STARTUP" },
{ field: "foundedAt", operator: "older_than_years", value: 5 }
],
logic: "AND" | "OR",
action: "PASS" | "REJECT" | "FLAG"
}
// DOCUMENT_CHECK
{
requiredFileTypes: ["pdf", "docx"],
minFileCount: 2,
action: "FLAG"
}
// AI_SCREENING
{
criteriaText: "Project must demonstrate clear ocean conservation impact",
action: "FLAG",
batchSize: 20,
parallelBatches: 1
}
```
### FilteringResult Model
```prisma
model FilteringResult {
id String @id
stageId String
projectId String
outcome FilteringOutcome // PASSED | FILTERED_OUT | FLAGGED
ruleResultsJson Json? @db.JsonB // Per-rule results
aiScreeningJson Json? @db.JsonB // AI screening details
// Admin override
overriddenBy String?
overriddenAt DateTime?
overrideReason String? @db.Text
finalOutcome FilteringOutcome?
stage Stage @relation(fields: [stageId], references: [id])
project Project @relation(fields: [projectId], references: [id])
overriddenByUser User? @relation("FilteringOverriddenBy", fields: [overriddenBy], references: [id])
@@unique([stageId, projectId])
}
enum FilteringOutcome {
PASSED // Auto-advance to next round
FILTERED_OUT // Auto-reject
FLAGGED // Manual review required
}
```
### FilteringJob Model
```prisma
model FilteringJob {
id String @id
stageId String
status FilteringJobStatus @default(PENDING)
totalProjects Int @default(0)
totalBatches Int @default(0)
currentBatch Int @default(0)
processedCount Int @default(0)
passedCount Int @default(0)
filteredCount Int @default(0)
flaggedCount Int @default(0)
errorMessage String? @db.Text
startedAt DateTime?
completedAt DateTime?
stage Stage @relation(fields: [stageId], references: [id])
}
enum FilteringJobStatus {
PENDING
RUNNING
COMPLETED
FAILED
}
```
### AI Screening Flow
```typescript
// src/server/services/ai-filtering.ts
export async function executeAIScreening(
config: AIScreeningConfig,
projects: ProjectForFiltering[],
userId?: string,
entityId?: string,
onProgress?: ProgressCallback
): Promise<Map<string, AIScreeningResult>>
```
**AI Screening Steps:**
1. **Anonymization** — Strip PII before sending to OpenAI (see `anonymization.ts`)
2. **Batch Processing** — Group projects into configurable batch sizes (default 20)
3. **GPT Evaluation** — Send to OpenAI with rubric criteria
4. **Result Parsing** — Parse JSON response with confidence scores
5. **Confidence Banding** — Split into auto-pass/manual-review/auto-reject buckets
6. **Logging** — Track token usage in AIUsageLog
**Confidence Thresholds:**
```typescript
const AI_CONFIDENCE_THRESHOLD_PASS = 0.75 // Auto-pass if ≥ 0.75 and meetsAllCriteria
const AI_CONFIDENCE_THRESHOLD_REJECT = 0.25 // Auto-reject if ≤ 0.25 and !meetsAllCriteria
// Between 0.25-0.75 → FLAGGED for manual review
```
### Duplicate Detection
```typescript
// Current implementation in stage-filtering.ts (lines 264-289)
// Groups projects by submittedByEmail to detect duplicates
// Duplicates are ALWAYS flagged (never auto-rejected)
const duplicateProjectIds = new Set<string>()
const emailToProjects = new Map<string, Array<{ id: string; title: string }>>()
for (const project of projects) {
const email = (project.submittedByEmail ?? '').toLowerCase().trim()
if (!email) continue
if (!emailToProjects.has(email)) emailToProjects.set(email, [])
emailToProjects.get(email)!.push({ id: project.id, title: project.title })
}
// If any email has > 1 project, all siblings are flagged
emailToProjects.forEach((group, _email) => {
if (group.length <= 1) return
for (const p of group) {
duplicateProjectIds.add(p.id)
}
})
```
**Duplicate Metadata Stored:**
```json
{
"isDuplicate": true,
"siblingProjectIds": ["proj-2", "proj-3"],
"duplicateNote": "This project shares a submitter email with 2 other project(s)."
}
```
### Filtering Execution Flow
```typescript
// src/server/services/stage-filtering.ts
export async function runStageFiltering(
stageId: string,
actorId: string,
prisma: PrismaClient
): Promise<StageFilteringResult>
```
**Execution Steps:**
1. Load all projects in PENDING/IN_PROGRESS state for this stage
2. Create FilteringJob for progress tracking
3. Load active FilteringRule records (ordered by priority)
4. **Run duplicate detection** (built-in, always runs first)
5. **Run deterministic rules** (FIELD_BASED, DOCUMENT_CHECK)
- If any REJECT rule fails → outcome = FILTERED_OUT
- If any FLAG rule fails → outcome = FLAGGED
6. **Run AI screening** (if enabled and deterministic passed OR if duplicate)
- Batch process with configurable size
- Band by confidence
- Attach duplicate metadata
7. **Save FilteringResult** for each project
8. Update FilteringJob counts (passed/rejected/flagged)
9. Log decision audit
---
## Redesigned Filtering Round
### Round Model Changes
```prisma
model Round {
id String @id @default(cuid())
competitionId String
name String // "AI Screening & Eligibility Check"
slug String // "filtering"
roundType RoundType // FILTERING (renamed from FILTER)
status RoundStatus @default(ROUND_DRAFT)
sortOrder Int @default(0)
// Time windows
windowOpenAt DateTime?
windowCloseAt DateTime?
// Round-type-specific configuration (validated by Zod)
configJson Json? @db.JsonB
// Relations
competition Competition @relation(fields: [competitionId], references: [id])
projectRoundStates ProjectRoundState[]
filteringRules FilteringRule[]
filteringResults FilteringResult[]
filteringJobs FilteringJob[]
advancementRules AdvancementRule[]
}
enum RoundType {
INTAKE
FILTERING // Renamed from FILTER for clarity
EVALUATION
SUBMISSION // New: multi-round submissions
MENTORING // New: mentor workspace
LIVE_FINAL
CONFIRMATION // New: winner agreement
}
```
### FilteringConfig Type (Zod-Validated)
```typescript
// src/types/round-configs.ts
export type FilteringConfig = {
// Rule engine
rules: FilterRuleDef[] // Configured rules (can be empty to skip deterministic filtering)
// AI screening
aiScreeningEnabled: boolean
aiRubricPrompt: string // Custom rubric for AI (plain-language criteria)
aiConfidenceThresholds: {
high: number // Above this = auto-pass (default: 0.85)
medium: number // Above this = flag for review (default: 0.6)
low: number // Below this = auto-reject (default: 0.4)
}
aiBatchSize: number // Projects per AI batch (default: 20, max: 50)
aiParallelBatches: number // Concurrent batches (default: 1, max: 10)
// Duplicate detection
duplicateDetectionEnabled: boolean
duplicateThreshold: number // Email similarity threshold (0-1, default: 1.0 = exact match)
duplicateAction: 'FLAG' | 'AUTO_REJECT' // Default: FLAG (always recommend FLAG)
// Advancement behavior
autoAdvancePassingProjects: boolean // Auto-advance PASSED projects to next round
manualReviewRequired: boolean // All results require admin approval before advance
// Eligibility criteria (structured)
eligibilityCriteria: EligibilityCriteria[]
// Category-specific rules
categorySpecificRules: {
STARTUP?: CategoryRuleSet
BUSINESS_CONCEPT?: CategoryRuleSet
}
}
export type FilterRuleDef = {
id?: string // Optional — for editing existing rules
name: string
ruleType: 'FIELD_CHECK' | 'DOCUMENT_CHECK' | 'AI_SCORE' | 'DUPLICATE' | 'CUSTOM'
config: FilterRuleConfig
priority: number // Lower = run first
isActive: boolean
action: 'PASS' | 'REJECT' | 'FLAG'
}
export type FilterRuleConfig =
| FieldCheckConfig
| DocumentCheckConfig
| AIScoreConfig
| CustomConfig
export type FieldCheckConfig = {
conditions: FieldCondition[]
logic: 'AND' | 'OR'
}
export type FieldCondition = {
field: 'competitionCategory' | 'foundedAt' | 'country' | 'geographicZone' | 'tags' | 'oceanIssue' | 'wantsMentorship' | 'institution'
operator: 'equals' | 'not_equals' | 'contains' | 'in' | 'not_in' | 'is_empty' | 'greater_than' | 'less_than' | 'older_than_years' | 'newer_than_years'
value: string | number | string[] | boolean
}
export type DocumentCheckConfig = {
requiredFileTypes?: string[] // e.g., ['pdf', 'docx']
minFileCount?: number
maxFileCount?: number
minTotalSizeMB?: number
maxTotalSizeMB?: number
}
export type AIScoreConfig = {
criteriaText: string // Plain-language rubric
minScore: number // Minimum AI score to pass (0-10)
weightInOverall: number // Weight if combining multiple AI rules (0-1)
}
export type CustomConfig = {
// For future extension — custom JS/Python evaluation
scriptUrl?: string
functionName?: string
parameters?: Record<string, unknown>
}
export type EligibilityCriteria = {
name: string
description: string
required: boolean
checkType: 'field' | 'document' | 'ai' | 'custom'
checkConfig: FilterRuleConfig
}
export type CategoryRuleSet = {
minAge?: number // Years since founded
maxAge?: number
requiredTags?: string[]
excludedCountries?: string[]
requiredDocuments?: string[]
}
```
### Zod Schema for FilteringConfig
```typescript
// src/lib/round-config-schemas.ts
import { z } from 'zod'
export const FieldConditionSchema = z.object({
field: z.enum([
'competitionCategory',
'foundedAt',
'country',
'geographicZone',
'tags',
'oceanIssue',
'wantsMentorship',
'institution'
]),
operator: z.enum([
'equals',
'not_equals',
'contains',
'in',
'not_in',
'is_empty',
'greater_than',
'less_than',
'older_than_years',
'newer_than_years'
]),
value: z.union([
z.string(),
z.number(),
z.array(z.string()),
z.boolean()
])
})
export const FieldCheckConfigSchema = z.object({
conditions: z.array(FieldConditionSchema),
logic: z.enum(['AND', 'OR'])
})
export const DocumentCheckConfigSchema = z.object({
requiredFileTypes: z.array(z.string()).optional(),
minFileCount: z.number().int().min(0).optional(),
maxFileCount: z.number().int().min(0).optional(),
minTotalSizeMB: z.number().min(0).optional(),
maxTotalSizeMB: z.number().min(0).optional()
})
export const AIScoreConfigSchema = z.object({
criteriaText: z.string().min(10).max(5000),
minScore: z.number().min(0).max(10),
weightInOverall: z.number().min(0).max(1).default(1.0)
})
export const CustomConfigSchema = z.object({
scriptUrl: z.string().url().optional(),
functionName: z.string().optional(),
parameters: z.record(z.unknown()).optional()
})
export const FilterRuleDefSchema = z.object({
id: z.string().optional(),
name: z.string().min(1).max(255),
ruleType: z.enum(['FIELD_CHECK', 'DOCUMENT_CHECK', 'AI_SCORE', 'DUPLICATE', 'CUSTOM']),
config: z.union([
FieldCheckConfigSchema,
DocumentCheckConfigSchema,
AIScoreConfigSchema,
CustomConfigSchema
]),
priority: z.number().int().min(0).default(0),
isActive: z.boolean().default(true),
action: z.enum(['PASS', 'REJECT', 'FLAG'])
})
export const CategoryRuleSetSchema = z.object({
minAge: z.number().int().min(0).optional(),
maxAge: z.number().int().min(0).optional(),
requiredTags: z.array(z.string()).optional(),
excludedCountries: z.array(z.string()).optional(),
requiredDocuments: z.array(z.string()).optional()
})
export const FilteringConfigSchema = z.object({
rules: z.array(FilterRuleDefSchema).default([]),
aiScreeningEnabled: z.boolean().default(false),
aiRubricPrompt: z.string().min(0).max(10000).default(''),
aiConfidenceThresholds: z.object({
high: z.number().min(0).max(1).default(0.85),
medium: z.number().min(0).max(1).default(0.6),
low: z.number().min(0).max(1).default(0.4)
}).default({ high: 0.85, medium: 0.6, low: 0.4 }),
aiBatchSize: z.number().int().min(1).max(50).default(20),
aiParallelBatches: z.number().int().min(1).max(10).default(1),
duplicateDetectionEnabled: z.boolean().default(true),
duplicateThreshold: z.number().min(0).max(1).default(1.0),
duplicateAction: z.enum(['FLAG', 'AUTO_REJECT']).default('FLAG'),
autoAdvancePassingProjects: z.boolean().default(false),
manualReviewRequired: z.boolean().default(true),
eligibilityCriteria: z.array(z.object({
name: z.string(),
description: z.string(),
required: z.boolean(),
checkType: z.enum(['field', 'document', 'ai', 'custom']),
checkConfig: z.union([
FieldCheckConfigSchema,
DocumentCheckConfigSchema,
AIScoreConfigSchema,
CustomConfigSchema
])
})).default([]),
categorySpecificRules: z.object({
STARTUP: CategoryRuleSetSchema.optional(),
BUSINESS_CONCEPT: CategoryRuleSetSchema.optional()
}).default({})
})
export type FilteringConfig = z.infer<typeof FilteringConfigSchema>
```
---
## Filtering Rule Engine
### Rule Evaluation Order
```
1. Built-in Duplicate Detection (if enabled)
2. FIELD_CHECK rules (sorted by priority ascending)
3. DOCUMENT_CHECK rules (sorted by priority ascending)
4. AI_SCORE rules (if aiScreeningEnabled) — batch processed
5. CUSTOM rules (future extension)
6. Determine final outcome: PASSED | FILTERED_OUT | FLAGGED
```
### Rule Types in Detail
#### 1. FIELD_CHECK
**Purpose:** Validate project metadata fields against conditions.
**Operators:**
| Operator | Description | Example |
|----------|-------------|---------|
| `equals` | Field equals value | `competitionCategory equals "STARTUP"` |
| `not_equals` | Field does not equal value | `country not_equals "France"` |
| `contains` | Field contains substring (case-insensitive) | `tags contains "conservation"` |
| `in` | Field value is in array | `country in ["Monaco", "France", "Italy"]` |
| `not_in` | Field value not in array | `oceanIssue not_in ["OTHER"]` |
| `is_empty` | Field is null, empty string, or empty array | `institution is_empty` |
| `greater_than` | Numeric comparison | `teamMemberCount greater_than 2` |
| `less_than` | Numeric comparison | `fundingGoal less_than 100000` |
| `older_than_years` | Date comparison (foundedAt) | `foundedAt older_than_years 5` |
| `newer_than_years` | Date comparison (foundedAt) | `foundedAt newer_than_years 2` |
**Example Rule:**
```json
{
"name": "Startups Must Be < 5 Years Old",
"ruleType": "FIELD_CHECK",
"config": {
"conditions": [
{ "field": "competitionCategory", "operator": "equals", "value": "STARTUP" },
{ "field": "foundedAt", "operator": "newer_than_years", "value": 5 }
],
"logic": "AND"
},
"priority": 10,
"isActive": true,
"action": "REJECT"
}
```
**Logic:**
- `AND`: All conditions must be true
- `OR`: At least one condition must be true
**Action:**
- `PASS`: If conditions met, mark as passed (continue to next rule)
- `REJECT`: If conditions met, auto-reject (short-circuit)
- `FLAG`: If conditions met, flag for manual review
#### 2. DOCUMENT_CHECK
**Purpose:** Verify file uploads meet requirements.
**Checks:**
```typescript
type DocumentCheckConfig = {
requiredFileTypes?: string[] // e.g., ['pdf', 'docx'] — must have at least one of each
minFileCount?: number // Minimum number of files
maxFileCount?: number // Maximum number of files
minTotalSizeMB?: number // Minimum total upload size
maxTotalSizeMB?: number // Maximum total upload size
}
```
**Example Rule:**
```json
{
"name": "Must Upload Executive Summary + Business Plan",
"ruleType": "DOCUMENT_CHECK",
"config": {
"requiredFileTypes": ["pdf"],
"minFileCount": 2
},
"priority": 20,
"isActive": true,
"action": "FLAG"
}
```
#### 3. AI_SCORE
**Purpose:** GPT-powered rubric evaluation.
**Config:**
```typescript
type AIScoreConfig = {
criteriaText: string // Plain-language rubric
minScore: number // Minimum score to pass (0-10)
weightInOverall: number // Weight if combining multiple AI rules
}
```
**Example Rule:**
```json
{
"name": "AI: Ocean Impact Assessment",
"ruleType": "AI_SCORE",
"config": {
"criteriaText": "Project must demonstrate measurable ocean conservation impact with clear metrics and realistic timeline. Reject spam or unrelated projects.",
"minScore": 6.0,
"weightInOverall": 1.0
},
"priority": 30,
"isActive": true,
"action": "FLAG"
}
```
**AI Evaluation Flow:**
1. Anonymize project data (strip PII)
2. Batch projects (configurable batch size)
3. Send to OpenAI with rubric
4. Parse response:
```json
{
"projects": [
{
"project_id": "anon-123",
"meets_criteria": true,
"confidence": 0.82,
"reasoning": "Clear ocean conservation focus, realistic metrics",
"quality_score": 7,
"spam_risk": false
}
]
}
```
5. Band by confidence thresholds
6. Store in `aiScreeningJson` on FilteringResult
#### 4. DUPLICATE
**Purpose:** Detect multiple submissions from same applicant.
**Built-in Rule:**
- Always runs first if `duplicateDetectionEnabled: true`
- Groups projects by `submittedByEmail`
- Flags all projects in duplicate groups
- Never auto-rejects duplicates (admin must decide which to keep)
**Duplicate Metadata:**
```json
{
"isDuplicate": true,
"siblingProjectIds": ["proj-2", "proj-3"],
"duplicateNote": "This project shares a submitter email with 2 other project(s). Admin must review and decide which to keep.",
"similarityScore": 1.0
}
```
**Future Enhancement: Semantic Similarity**
```typescript
duplicateThreshold: number // 0-1 (e.g., 0.8 = 80% similar text triggers duplicate flag)
```
Use text embeddings to detect duplicates beyond exact email match (compare titles, descriptions).
#### 5. CUSTOM (Future Extension)
**Purpose:** Run custom evaluation scripts (JS/Python).
**Config:**
```typescript
type CustomConfig = {
scriptUrl?: string // URL to hosted script
functionName?: string // Function to call
parameters?: Record<string, unknown>
}
```
**Example Use Case:**
- External API call to verify company registration
- Custom formula combining multiple fields
- Integration with third-party data sources
---
## Rule Combination Logic
### How Rules Are Combined
```typescript
// Pseudocode for rule evaluation
let finalOutcome: 'PASSED' | 'FILTERED_OUT' | 'FLAGGED' = 'PASSED'
let hasFailed = false
let hasFlagged = false
// Run rules in priority order
for (const rule of rules.sort((a, b) => a.priority - b.priority)) {
const result = evaluateRule(rule, project)
if (!result.passed) {
if (rule.action === 'REJECT') {
hasFailed = true
break // Short-circuit — no need to run remaining rules
} else if (rule.action === 'FLAG') {
hasFlagged = true
// Continue to next rule
}
}
}
// Determine final outcome
if (hasFailed) {
finalOutcome = 'FILTERED_OUT'
} else if (hasFlagged) {
finalOutcome = 'FLAGGED'
} else {
finalOutcome = 'PASSED'
}
// Override: Duplicates always flagged (never auto-rejected)
if (isDuplicate && finalOutcome === 'FILTERED_OUT') {
finalOutcome = 'FLAGGED'
}
```
### Weighted Scoring (Advanced)
For multiple AI rules or field checks, admins can configure weighted scoring:
```typescript
type WeightedScoringConfig = {
enabled: boolean
rules: Array<{
ruleId: string
weight: number // 0-1
}>
passingThreshold: number // Combined weighted score needed to pass (0-10)
}
```
**Example:**
```json
{
"enabled": true,
"rules": [
{ "ruleId": "ai-ocean-impact", "weight": 0.6 },
{ "ruleId": "ai-innovation-score", "weight": 0.4 }
],
"passingThreshold": 7.0
}
```
Combined score = (7.5 × 0.6) + (8.0 × 0.4) = 4.5 + 3.2 = 7.7 → PASSED
---
## AI Screening Pipeline
### Step-by-Step Flow
```
1. Load Projects
2. Anonymize Data (strip PII)
3. Batch Projects (configurable size: 1-50, default 20)
4. Parallel Processing (configurable: 1-10 concurrent batches)
5. OpenAI API Call (GPT-4o or configured model)
6. Parse JSON Response
7. Map Anonymous IDs → Real Project IDs
8. Band by Confidence Threshold
9. Store Results in FilteringResult
10. Log Token Usage (AIUsageLog)
```
### Anonymization
```typescript
// src/server/services/anonymization.ts
export function anonymizeProjectsForAI(
projects: ProjectWithRelations[],
purpose: 'FILTERING' | 'ASSIGNMENT' | 'SUMMARY'
): { anonymized: AnonymizedProjectForAI[]; mappings: ProjectAIMapping[] }
```
**What's Stripped:**
- Team member names
- Submitter email
- Submitter name
- Personal identifiers in metadata
- File paths (only file types retained)
**What's Kept:**
- Project title (if generic)
- Description
- Category (STARTUP/BUSINESS_CONCEPT)
- Country
- Tags
- Ocean issue
- Founded date (year only)
**Validation:**
```typescript
export function validateAnonymizedProjects(
anonymized: AnonymizedProjectForAI[]
): boolean
```
Checks for PII patterns:
- Email addresses (`/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b/i`)
- Phone numbers
- Full names (heuristic)
- URLs with query params
**GDPR Compliance:**
- All AI calls must pass `validateAnonymizedProjects()` check
- Fails if PII detected → throws error, logs, flags all projects for manual review
### OpenAI Prompt Structure
**System Prompt:**
```
Project screening assistant. Evaluate against criteria, return JSON.
Format: {"projects": [{project_id, meets_criteria: bool, confidence: 0-1, reasoning: str, quality_score: 1-10, spam_risk: bool}]}
Be objective. Base evaluation only on provided data. No personal identifiers in reasoning.
```
**User Prompt:**
```
CRITERIA: {aiRubricPrompt}
PROJECTS: [{project_id, title, description, category, tags, ...}]
Evaluate and return JSON.
```
**Response Format:**
```json
{
"projects": [
{
"project_id": "anon-001",
"meets_criteria": true,
"confidence": 0.82,
"reasoning": "Clear ocean conservation focus with measurable impact metrics. Realistic timeline. Strong innovation.",
"quality_score": 8,
"spam_risk": false
},
{
"project_id": "anon-002",
"meets_criteria": false,
"confidence": 0.91,
"reasoning": "Generic description, no specific ocean impact. Appears to be spam or off-topic.",
"quality_score": 2,
"spam_risk": true
}
]
}
```
### Confidence Banding
```typescript
function bandByConfidence(
aiScreeningData: { confidence: number; meetsAllCriteria: boolean }
): { outcome: 'PASSED' | 'FILTERED_OUT' | 'FLAGGED'; confidence: number }
```
**Default Thresholds:**
| Confidence | Meets Criteria | Outcome | Action |
|------------|----------------|---------|--------|
| ≥ 0.85 | true | PASSED | Auto-advance |
| 0.60-0.84 | true | FLAGGED | Manual review |
| 0.40-0.59 | any | FLAGGED | Manual review |
| ≤ 0.39 | false | FILTERED_OUT | Auto-reject |
**Admin Override:**
Admins can customize thresholds in `FilteringConfig.aiConfidenceThresholds`.
---
## Duplicate Detection
### Current Implementation
```typescript
// Built-in email-based duplicate detection
const emailToProjects = new Map<string, Array<{ id: string; title: string }>>()
for (const project of projects) {
const email = (project.submittedByEmail ?? '').toLowerCase().trim()
if (!email) continue
if (!emailToProjects.has(email)) emailToProjects.set(email, [])
emailToProjects.get(email)!.push({ id: project.id, title: project.title })
}
// Flag all projects in groups of size > 1
emailToProjects.forEach((group) => {
if (group.length <= 1) return
for (const p of group) {
duplicateProjectIds.add(p.id)
}
})
```
### Enhanced Detection (Future)
**Text Similarity:**
```typescript
import { cosineSimilarity } from '@/lib/text-similarity'
function detectDuplicatesByText(
projects: Project[],
threshold: number = 0.8
): Set<string>
```
**Algorithm:**
1. Generate text embeddings for title + description
2. Compute pairwise cosine similarity
3. Flag projects with similarity ≥ threshold
4. Group into duplicate clusters
**Example:**
Project A: "Ocean cleanup robot using AI"
Project B: "AI-powered ocean cleaning robot"
Similarity: 0.92 → Flagged as duplicates
### Duplicate Metadata
```json
{
"isDuplicate": true,
"siblingProjectIds": ["proj-2", "proj-3"],
"duplicateNote": "This project shares a submitter email with 2 other project(s). Admin must review and decide which to keep.",
"similarityScore": 1.0,
"detectionMethod": "email" | "text_similarity"
}
```
### Admin Duplicate Review UI
```
┌─────────────────────────────────────────────────────────────┐
│ Duplicate Group: applicant@example.com │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Project 1: "Ocean Cleanup Robot" │ │
│ │ Submitted: 2026-02-01 10:30 AM │ │
│ │ Category: STARTUP │ │
│ │ AI Score: 7.5/10 │ │
│ │ │ │
│ │ [✓ Keep This] [✗ Reject] [View Details] │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Project 2: "AI-Powered Ocean Cleaner" │ │
│ │ Submitted: 2026-02-05 2:15 PM │ │
│ │ Category: STARTUP │ │
│ │ AI Score: 6.8/10 │ │
│ │ │ │
│ │ [✓ Keep This] [✗ Reject] [View Details] │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ Recommendation: Keep Project 1 (higher AI score, earlier │
│ submission) │
│ │
│ [Approve Recommendation] [Manual Decision] │
└─────────────────────────────────────────────────────────────┘
```
---
## Admin Experience
### Filtering Dashboard
```
┌───────────────────────────────────────────────────────────────────┐
│ Round 2: AI Screening & Eligibility Check │
│ │
│ Status: Completed ● Last Run: 2026-02-10 3:45 PM │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Results Summary │ │
│ │ │ │
│ │ ✓ Passed: 142 projects (auto-advance enabled) │ │
│ │ ✗ Filtered Out: 28 projects │ │
│ │ ⚠ Flagged: 15 projects (manual review required) │ │
│ │ ──────────────────────────────────────────────────────── │ │
│ │ Total: 185 projects processed │ │
│ │ │ │
│ │ AI Usage: 12,450 tokens ($0.15) │ │
│ │ Processing Time: 2m 34s │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Manual Review Queue (15) [Sort ▼] │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────────────────┐ │ │
│ │ │ ⚠ Ocean Cleanup Initiative │ │ │
│ │ │ Category: STARTUP │ │ │
│ │ │ Reason: Duplicate submission (2 projects) │ │ │
│ │ │ AI Score: 7.2/10 (confidence: 0.65) │ │ │
│ │ │ │ │ │
│ │ │ Failed Rules: │ │ │
│ │ │ • Duplicate Detection: EMAIL_MATCH │ │ │
│ │ │ │ │ │
│ │ │ [View Details] [✓ Approve] [✗ Reject] │ │ │
│ │ └──────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────────────────┐ │ │
│ │ │ ⚠ Blue Carbon Project │ │ │
│ │ │ Category: BUSINESS_CONCEPT │ │ │
│ │ │ Reason: AI confidence medium (0.58) │ │ │
│ │ │ AI Score: 5.5/10 │ │ │
│ │ │ │ │ │
│ │ │ AI Reasoning: "Project description is vague and │ │ │
│ │ │ lacks specific impact metrics. Needs clarification." │ │ │
│ │ │ │ │ │
│ │ │ [View Details] [✓ Approve] [✗ Reject] │ │ │
│ │ └──────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ... 13 more flagged projects │ │
│ │ │ │
│ │ [Batch Approve All] [Export Queue] │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ [Re-run Filtering] [Configure Rules] [View Logs] │
└───────────────────────────────────────────────────────────────────┘
```
### Rule Configuration UI
```
┌───────────────────────────────────────────────────────────────────┐
│ Filtering Rules Configuration │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Active Rules (5) [+ Add Rule] │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────────────────┐ │ │
│ │ │ ≡ Rule 1: Startups Must Be < 5 Years Old
│ │ │ Type: FIELD_CHECK │ │ │
│ │ │ Action: REJECT │ │ │
│ │ │ Priority: 10 [Edit] [✗] │ │ │
│ │ └──────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────────────────┐ │ │
│ │ │ ≡ Rule 2: Must Upload Executive Summary │ │ │
│ │ │ Type: DOCUMENT_CHECK │ │ │
│ │ │ Action: FLAG │ │ │
│ │ │ Priority: 20 [Edit] [✗] │ │ │
│ │ └──────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────────────────┐ │ │
│ │ │ ≡ Rule 3: AI Ocean Impact Assessment │ │ │
│ │ │ Type: AI_SCORE │ │ │
│ │ │ Action: FLAG │ │ │
│ │ │ Priority: 30 [Edit] [✗] │ │ │
│ │ │ Rubric: "Project must demonstrate measurable..." │ │ │
│ │ │ Min Score: 6.0/10 │ │ │
│ │ └──────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ... 2 more rules │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ AI Settings │ │
│ │ │ │
│ │ AI Screening: [✓ Enabled] │ │
│ │ Batch Size: [20] projects (1-50) │ │
│ │ Parallel Batches: [2] (1-10) │ │
│ │ │ │
│ │ Confidence Thresholds: │ │
│ │ High (auto-pass): [0.85] │ │
│ │ Medium (review): [0.60] │ │
│ │ Low (auto-reject): [0.40] │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Duplicate Detection │ │
│ │ │ │
│ │ Email-based: [✓ Enabled] │ │
│ │ Text similarity: [ ] Disabled (future) │ │
│ │ Similarity threshold: [0.80] (0-1) │ │
│ │ Action on duplicates: [FLAG] (recommended) │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ [Save Configuration] [Test Rules] [Cancel] │
└───────────────────────────────────────────────────────────────────┘
```
### Manual Override Controls
```
┌───────────────────────────────────────────────────────────────────┐
│ Manual Override: Ocean Cleanup Initiative │
│ │
│ Current Outcome: ⚠ FLAGGED │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Project Details │ │
│ │ │ │
│ │ Title: Ocean Cleanup Initiative │ │
│ │ Category: STARTUP │ │
│ │ Submitted: 2026-02-01 10:30 AM │ │
│ │ Applicant: applicant@example.com │ │
│ │ │ │
│ │ Description: [View Full Description] │ │
│ │ Files: executive-summary.pdf, business-plan.docx │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Filtering Results │ │
│ │ │ │
│ │ ✓ Rule 1: Startups < 5 Years Old PASSED
│ │ ✓ Rule 2: Upload Executive Summary PASSED │ │
│ │ ✗ Rule 3: Duplicate Detection FLAGGED │ │
│ │ → Reason: 2 projects from applicant@example.com │ │
│ │ → Sibling: "AI-Powered Ocean Cleaner" (proj-2) │ │
│ │ ⚠ Rule 4: AI Ocean Impact FLAGGED │ │
│ │ → AI Score: 7.2/10 │ │
│ │ → Confidence: 0.65 (medium) │ │
│ │ → Reasoning: "Clear ocean focus but needs more specific │ │
│ │ impact metrics. Potential duplicate." │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Override Decision │ │
│ │ │ │
│ │ New Outcome: ○ Approve (PASSED) ○ Reject (FILTERED_OUT) │ │
│ │ │ │
│ │ Reason (required): │ │
│ │ ┌──────────────────────────────────────────────────────┐ │ │
│ │ │ Reviewed duplicate group — this is the stronger │ │ │
│ │ │ submission. AI score above threshold. Approved to │ │ │
│ │ │ advance. │ │ │
│ │ └──────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ [Submit Override] [Cancel] │ │
│ └─────────────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────────┘
```
---
## API Changes
### New tRPC Procedures
```typescript
// src/server/routers/filtering.ts
export const filteringRouter = router({
// Run filtering for a round
runFiltering: adminProcedure
.input(z.object({ roundId: z.string() }))
.mutation(async ({ ctx, input }) => {
return runStageFiltering(input.roundId, ctx.user.id, ctx.prisma)
}),
// Get filtering job status
getJob: adminProcedure
.input(z.object({ jobId: z.string() }))
.query(async ({ ctx, input }) => {
return ctx.prisma.filteringJob.findUnique({
where: { id: input.jobId },
include: { round: { select: { name: true } } }
})
}),
// Get manual review queue
getManualQueue: adminProcedure
.input(z.object({ roundId: z.string() }))
.query(async ({ ctx, input }) => {
return getManualQueue(input.roundId, ctx.prisma)
}),
// Resolve manual decision
resolveDecision: adminProcedure
.input(z.object({
filteringResultId: z.string(),
outcome: z.enum(['PASSED', 'FILTERED_OUT']),
reason: z.string().min(10).max(1000)
}))
.mutation(async ({ ctx, input }) => {
return resolveManualDecision(
input.filteringResultId,
input.outcome,
input.reason,
ctx.user.id,
ctx.prisma
)
}),
// Batch override
batchResolve: adminProcedure
.input(z.object({
filteringResultIds: z.array(z.string()),
outcome: z.enum(['PASSED', 'FILTERED_OUT']),
reason: z.string().min(10).max(1000)
}))
.mutation(async ({ ctx, input }) => {
for (const id of input.filteringResultIds) {
await resolveManualDecision(id, input.outcome, input.reason, ctx.user.id, ctx.prisma)
}
}),
// Export results
exportResults: adminProcedure
.input(z.object({ roundId: z.string() }))
.query(async ({ ctx, input }) => {
// Return CSV-ready data
}),
// Configure filtering rules
configureRules: adminProcedure
.input(z.object({
roundId: z.string(),
rules: z.array(FilterRuleDefSchema)
}))
.mutation(async ({ ctx, input }) => {
// Delete existing rules, create new ones
}),
// Update round config
updateConfig: adminProcedure
.input(z.object({
roundId: z.string(),
config: FilteringConfigSchema
}))
.mutation(async ({ ctx, input }) => {
await ctx.prisma.round.update({
where: { id: input.roundId },
data: { configJson: input.config as any }
})
})
})
```
---
## Service Functions
### Core Service Signatures
```typescript
// src/server/services/round-filtering.ts
export async function runRoundFiltering(
roundId: string,
actorId: string,
prisma: PrismaClient
): Promise<FilteringJobResult>
export async function getManualQueue(
roundId: string,
prisma: PrismaClient
): Promise<ManualQueueItem[]>
export async function resolveManualDecision(
filteringResultId: string,
outcome: 'PASSED' | 'FILTERED_OUT',
reason: string,
actorId: string,
prisma: PrismaClient
): Promise<void>
export async function advanceFromFilteringRound(
roundId: string,
actorId: string,
prisma: PrismaClient
): Promise<AdvancementResult>
type FilteringJobResult = {
jobId: string
total: number
passed: number
rejected: number
flagged: number
tokensUsed: number
processingTime: number
}
type ManualQueueItem = {
filteringResultId: string
projectId: string
projectTitle: string
outcome: string
ruleResults: RuleResult[]
aiScreeningJson: Record<string, unknown> | null
createdAt: Date
}
type AdvancementResult = {
advancedCount: number
targetRoundId: string
targetRoundName: string
notificationsSent: number
}
```
---
## Edge Cases
| Edge Case | Handling |
|-----------|----------|
| **No projects to filter** | FilteringJob completes immediately with 0 processed |
| **AI API failure** | Flag all projects for manual review, log error, continue |
| **Duplicate with different outcomes** | Always flag duplicates (never auto-reject) |
| **Admin overrides auto-rejected project** | Allowed — finalOutcome overrides outcome |
| **Project withdrawn during filtering** | Skip in filtering, mark WITHDRAWN in ProjectRoundState |
| **Rule misconfiguration** | Validate config on save, throw error if invalid |
| **All projects flagged** | Valid scenario — requires manual review for all |
| **All projects auto-rejected** | Valid scenario — no advancement |
| **Advancement before manual review** | Blocked if `manualReviewRequired: true` |
| **Re-run filtering** | Deletes previous FilteringResult records, runs fresh |
| **AI response parse error** | Flag affected projects, log error, continue |
| **Duplicate groups > 10 projects** | Flag all, recommend batch review in UI |
| **Missing submittedByEmail** | Skip duplicate detection for this project |
| **Empty rule set** | All projects auto-pass (useful for testing) |
---
## Integration Points
### Connects To: INTAKE Round (Input)
- **Input:** Projects in PENDING/IN_PROGRESS state from INTAKE round
- **Data:** Project metadata, submitted files, team member data
- **Trigger:** Admin manually runs filtering after INTAKE window closes
### Connects To: EVALUATION Round (Output)
- **Output:** Passing projects advance to EVALUATION round
- **Data:** FilteringResult metadata attached to projects (AI scores, flags)
- **Trigger:** Auto-advance if `autoAdvancePassingProjects: true`, else manual
### Connects To: AI Services
- **Service:** `src/server/services/ai-filtering.ts`
- **Purpose:** GPT-powered rubric evaluation
- **Data Flow:** Anonymized project data → OpenAI → parsed results → confidence banding
### Connects To: Audit System
- **Tables:** `DecisionAuditLog`, `OverrideAction`, `AuditLog`, `AIUsageLog`
- **Events:** `filtering.completed`, `filtering.manual_decision`, `filtering.auto_advanced`
---
## Summary
The redesigned FILTERING round provides:
1. **Flexible Rule Engine** — Field checks, document checks, AI scoring, duplicates, custom scripts
2. **AI-Powered Screening** — GPT rubric evaluation with confidence banding
3. **Built-in Duplicate Detection** — Email-based (future: text similarity)
4. **Manual Review Queue** — Admin override system with full audit trail
5. **Batch Processing** — Configurable batch sizes for performance
6. **Progress Tracking** — FilteringJob model for long-running operations
7. **Auto-Advancement** — Passing projects can auto-advance to next round
8. **Full Auditability** — All decisions logged in DecisionAuditLog + OverrideAction
This replaces the current `FILTER` stage with a fully-featured, production-ready filtering system that balances automation with human oversight.