# AI Data Processing - GDPR Compliance Documentation ## Overview This document describes how project data is processed by AI services in the MOPC Platform, ensuring compliance with GDPR Articles 5, 6, 13-14, 25, and 32. ## Legal Basis | Processing Activity | Legal Basis | GDPR Article | |---------------------|-------------|--------------| | AI-powered project filtering | Legitimate interest | Art. 6(1)(f) | | AI-powered jury assignment | Legitimate interest | Art. 6(1)(f) | | AI-powered award eligibility | Legitimate interest | Art. 6(1)(f) | | AI-powered mentor matching | Legitimate interest | Art. 6(1)(f) | **Legitimate Interest Justification:** AI processing is used to efficiently evaluate ocean conservation projects and match appropriate reviewers, directly serving the platform's purpose of managing the Monaco Ocean Protection Challenge. ## Data Minimization (Article 5(1)(c)) The AI system applies strict data minimization: - **Only necessary fields** sent to AI (no names, emails, phone numbers) - **Descriptions truncated** to 300-500 characters maximum - **Team size** sent as count only (no member details) - **Dates** sent as year-only or ISO date (no timestamps) - **IDs replaced** with sequential anonymous identifiers (P1, P2, etc.) ## Anonymization Measures ### Data NEVER Sent to AI | Data Type | Reason | |-----------|--------| | Personal names | PII - identifying | | Email addresses | PII - identifying | | Phone numbers | PII - identifying | | Physical addresses | PII - identifying | | External URLs | Could identify individuals | | Internal project/user IDs | Could be cross-referenced | | Team member details | PII - identifying | | Internal comments | May contain PII | | File content | May contain PII | ### Data Sent to AI (Anonymized) | Field | Type | Purpose | Anonymization | |-------|------|---------|---------------| | project_id | String | Reference | Replaced with P1, P2, etc. | | title | String | Spam detection | PII patterns removed | | description | String | Criteria matching | Truncated, PII stripped | | category | Enum | Filtering | As-is (no PII) | | ocean_issue | Enum | Topic filtering | As-is (no PII) | | country | String | Geographic eligibility | As-is (country name only) | | region | String | Regional eligibility | As-is (zone name only) | | institution | String | Student identification | As-is (institution name only) | | tags | Array | Keyword matching | As-is (no PII expected) | | founded_year | Number | Age filtering | Year only, not full date | | team_size | Number | Team requirements | Count only | | file_count | Number | Document checks | Count only | | file_types | Array | File requirements | Type names only | | wants_mentorship | Boolean | Mentorship filtering | As-is | | submission_source | Enum | Source filtering | As-is | | submitted_date | String | Deadline checks | Date only, no time | ## Technical Safeguards ### PII Detection and Stripping ```typescript // Patterns detected and removed before AI processing const PII_PATTERNS = { email: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g, phone: /(\+?\d{1,3}[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}/g, url: /https?:\/\/[^\s]+/g, ssn: /\d{3}-\d{2}-\d{4}/g, ipv4: /\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/g, } ``` ### Validation Before Every AI Call ```typescript // GDPR compliance enforced before EVERY API call export function enforceGDPRCompliance(data: unknown[]): void { for (const item of data) { const { valid, violations } = validateNoPersonalData(item) if (!valid) { throw new Error(`GDPR compliance check failed: ${violations.join(', ')}`) } } } ``` ### ID Anonymization Real IDs are never sent to AI. Instead: - Projects: `cm1abc123...` → `P1`, `P2`, `P3` - Jurors: `cm2def456...` → `juror_001`, `juror_002` - Results mapped back using secure mapping tables ## Data Retention | Data Type | Retention | Deletion Method | |-----------|-----------|-----------------| | AI usage logs | 12 months | Automatic deletion | | Anonymized prompts | Not stored | Sent directly to API | | AI responses | Not stored | Parsed and discarded | **Note:** OpenAI does not retain API data for training (per their API Terms). API data is retained for up to 30 days for abuse monitoring, configurable to 0 days. ## Subprocessor: OpenAI | Aspect | Details | |--------|---------| | Subprocessor | OpenAI, Inc. | | Location | United States | | DPA Status | Data Processing Agreement in place | | Safeguards | Standard Contractual Clauses (SCCs) | | Compliance | SOC 2 Type II, GDPR-compliant | | Data Use | API data NOT used for model training | **OpenAI DPA:** https://openai.com/policies/data-processing-agreement ## Audit Trail All AI processing is logged: ```typescript await prisma.aIUsageLog.create({ data: { userId: ctx.user.id, // Who initiated action: 'FILTERING', // What type entityType: 'Round', // What entity entityId: roundId, // Which entity model: 'gpt-4o', // What model totalTokens: 1500, // Resource usage status: 'SUCCESS', // Outcome }, }) ``` ## Data Subject Rights ### Right of Access (Article 15) Users can request: - What data was processed by AI - When AI processing occurred - What decisions were made **Implementation:** Export AI usage logs for user's projects. ### Right to Erasure (Article 17) When a user requests deletion: - AI usage logs for their projects can be deleted - No data remains at OpenAI (API data not retained for training) **Note:** Since only anonymized data is sent to AI, there is no personal data at OpenAI to delete. ### Right to Object (Article 21) Users can request to opt out of AI processing: - Admin can disable AI features per round - Manual review fallback available for all AI features ## Risk Assessment ### Risk: PII Leakage to AI Provider | Factor | Assessment | |--------|------------| | Likelihood | Very Low | | Impact | Medium | | Mitigation | Automated PII detection, validation before every call | | Residual Risk | Very Low | ### Risk: AI Decision Bias | Factor | Assessment | |--------|------------| | Likelihood | Low | | Impact | Low | | Mitigation | Human review of all AI suggestions, algorithmic fallback | | Residual Risk | Very Low | ### Risk: Data Breach at Subprocessor | Factor | Assessment | |--------|------------| | Likelihood | Very Low | | Impact | Low (only anonymized data) | | Mitigation | OpenAI SOC 2 compliance, no PII sent | | Residual Risk | Very Low | ## Compliance Checklist - [x] Data minimization applied (only necessary fields) - [x] PII stripped before AI processing - [x] Anonymization validated before every API call - [x] DPA in place with OpenAI - [x] Audit logging of all AI operations - [x] Fallback available when AI declined - [x] Usage logs retained for 12 months only - [x] No personal data stored at subprocessor ## Contact For questions about AI data processing: - Data Protection Officer: [DPO email] - Technical Contact: [Tech contact email] ## See Also - [Platform GDPR Compliance](./platform-gdpr-compliance.md) - [AI System Architecture](../architecture/ai-system.md) - [AI Services Reference](../architecture/ai-services.md)