218 lines
7.1 KiB
Markdown
218 lines
7.1 KiB
Markdown
# AI Data Processing - GDPR Compliance Documentation
|
|
|
|
## Overview
|
|
|
|
This document describes how project data is processed by AI services in the MOPC Platform, ensuring compliance with GDPR Articles 5, 6, 13-14, 25, and 32.
|
|
|
|
## Legal Basis
|
|
|
|
| Processing Activity | Legal Basis | GDPR Article |
|
|
|---------------------|-------------|--------------|
|
|
| AI-powered project filtering | Legitimate interest | Art. 6(1)(f) |
|
|
| AI-powered jury assignment | Legitimate interest | Art. 6(1)(f) |
|
|
| AI-powered award eligibility | Legitimate interest | Art. 6(1)(f) |
|
|
| AI-powered mentor matching | Legitimate interest | Art. 6(1)(f) |
|
|
|
|
**Legitimate Interest Justification:** AI processing is used to efficiently evaluate ocean conservation projects and match appropriate reviewers, directly serving the platform's purpose of managing the Monaco Ocean Protection Challenge.
|
|
|
|
## Data Minimization (Article 5(1)(c))
|
|
|
|
The AI system applies strict data minimization:
|
|
|
|
- **Only necessary fields** sent to AI (no names, emails, phone numbers)
|
|
- **Descriptions truncated** to 300-500 characters maximum
|
|
- **Team size** sent as count only (no member details)
|
|
- **Dates** sent as year-only or ISO date (no timestamps)
|
|
- **IDs replaced** with sequential anonymous identifiers (P1, P2, etc.)
|
|
|
|
## Anonymization Measures
|
|
|
|
### Data NEVER Sent to AI
|
|
|
|
| Data Type | Reason |
|
|
|-----------|--------|
|
|
| Personal names | PII - identifying |
|
|
| Email addresses | PII - identifying |
|
|
| Phone numbers | PII - identifying |
|
|
| Physical addresses | PII - identifying |
|
|
| External URLs | Could identify individuals |
|
|
| Internal project/user IDs | Could be cross-referenced |
|
|
| Team member details | PII - identifying |
|
|
| Internal comments | May contain PII |
|
|
| File content | May contain PII |
|
|
|
|
### Data Sent to AI (Anonymized)
|
|
|
|
| Field | Type | Purpose | Anonymization |
|
|
|-------|------|---------|---------------|
|
|
| project_id | String | Reference | Replaced with P1, P2, etc. |
|
|
| title | String | Spam detection | PII patterns removed |
|
|
| description | String | Criteria matching | Truncated, PII stripped |
|
|
| category | Enum | Filtering | As-is (no PII) |
|
|
| ocean_issue | Enum | Topic filtering | As-is (no PII) |
|
|
| country | String | Geographic eligibility | As-is (country name only) |
|
|
| region | String | Regional eligibility | As-is (zone name only) |
|
|
| institution | String | Student identification | As-is (institution name only) |
|
|
| tags | Array | Keyword matching | As-is (no PII expected) |
|
|
| founded_year | Number | Age filtering | Year only, not full date |
|
|
| team_size | Number | Team requirements | Count only |
|
|
| file_count | Number | Document checks | Count only |
|
|
| file_types | Array | File requirements | Type names only |
|
|
| wants_mentorship | Boolean | Mentorship filtering | As-is |
|
|
| submission_source | Enum | Source filtering | As-is |
|
|
| submitted_date | String | Deadline checks | Date only, no time |
|
|
|
|
## Technical Safeguards
|
|
|
|
### PII Detection and Stripping
|
|
|
|
```typescript
|
|
// Patterns detected and removed before AI processing
|
|
const PII_PATTERNS = {
|
|
email: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g,
|
|
phone: /(\+?\d{1,3}[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}/g,
|
|
url: /https?:\/\/[^\s]+/g,
|
|
ssn: /\d{3}-\d{2}-\d{4}/g,
|
|
ipv4: /\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/g,
|
|
}
|
|
```
|
|
|
|
### Validation Before Every AI Call
|
|
|
|
```typescript
|
|
// GDPR compliance enforced before EVERY API call
|
|
export function enforceGDPRCompliance(data: unknown[]): void {
|
|
for (const item of data) {
|
|
const { valid, violations } = validateNoPersonalData(item)
|
|
if (!valid) {
|
|
throw new Error(`GDPR compliance check failed: ${violations.join(', ')}`)
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### ID Anonymization
|
|
|
|
Real IDs are never sent to AI. Instead:
|
|
- Projects: `cm1abc123...` → `P1`, `P2`, `P3`
|
|
- Jurors: `cm2def456...` → `juror_001`, `juror_002`
|
|
- Results mapped back using secure mapping tables
|
|
|
|
## Data Retention
|
|
|
|
| Data Type | Retention | Deletion Method |
|
|
|-----------|-----------|-----------------|
|
|
| AI usage logs | 12 months | Automatic deletion |
|
|
| Anonymized prompts | Not stored | Sent directly to API |
|
|
| AI responses | Not stored | Parsed and discarded |
|
|
|
|
**Note:** OpenAI does not retain API data for training (per their API Terms). API data is retained for up to 30 days for abuse monitoring, configurable to 0 days.
|
|
|
|
## Subprocessor: OpenAI
|
|
|
|
| Aspect | Details |
|
|
|--------|---------|
|
|
| Subprocessor | OpenAI, Inc. |
|
|
| Location | United States |
|
|
| DPA Status | Data Processing Agreement in place |
|
|
| Safeguards | Standard Contractual Clauses (SCCs) |
|
|
| Compliance | SOC 2 Type II, GDPR-compliant |
|
|
| Data Use | API data NOT used for model training |
|
|
|
|
**OpenAI DPA:** https://openai.com/policies/data-processing-agreement
|
|
|
|
## Audit Trail
|
|
|
|
All AI processing is logged:
|
|
|
|
```typescript
|
|
await prisma.aIUsageLog.create({
|
|
data: {
|
|
userId: ctx.user.id, // Who initiated
|
|
action: 'FILTERING', // What type
|
|
entityType: 'Round', // What entity
|
|
entityId: roundId, // Which entity
|
|
model: 'gpt-4o', // What model
|
|
totalTokens: 1500, // Resource usage
|
|
status: 'SUCCESS', // Outcome
|
|
},
|
|
})
|
|
```
|
|
|
|
## Data Subject Rights
|
|
|
|
### Right of Access (Article 15)
|
|
|
|
Users can request:
|
|
- What data was processed by AI
|
|
- When AI processing occurred
|
|
- What decisions were made
|
|
|
|
**Implementation:** Export AI usage logs for user's projects.
|
|
|
|
### Right to Erasure (Article 17)
|
|
|
|
When a user requests deletion:
|
|
- AI usage logs for their projects can be deleted
|
|
- No data remains at OpenAI (API data not retained for training)
|
|
|
|
**Note:** Since only anonymized data is sent to AI, there is no personal data at OpenAI to delete.
|
|
|
|
### Right to Object (Article 21)
|
|
|
|
Users can request to opt out of AI processing:
|
|
- Admin can disable AI features per round
|
|
- Manual review fallback available for all AI features
|
|
|
|
## Risk Assessment
|
|
|
|
### Risk: PII Leakage to AI Provider
|
|
|
|
| Factor | Assessment |
|
|
|--------|------------|
|
|
| Likelihood | Very Low |
|
|
| Impact | Medium |
|
|
| Mitigation | Automated PII detection, validation before every call |
|
|
| Residual Risk | Very Low |
|
|
|
|
### Risk: AI Decision Bias
|
|
|
|
| Factor | Assessment |
|
|
|--------|------------|
|
|
| Likelihood | Low |
|
|
| Impact | Low |
|
|
| Mitigation | Human review of all AI suggestions, algorithmic fallback |
|
|
| Residual Risk | Very Low |
|
|
|
|
### Risk: Data Breach at Subprocessor
|
|
|
|
| Factor | Assessment |
|
|
|--------|------------|
|
|
| Likelihood | Very Low |
|
|
| Impact | Low (only anonymized data) |
|
|
| Mitigation | OpenAI SOC 2 compliance, no PII sent |
|
|
| Residual Risk | Very Low |
|
|
|
|
## Compliance Checklist
|
|
|
|
- [x] Data minimization applied (only necessary fields)
|
|
- [x] PII stripped before AI processing
|
|
- [x] Anonymization validated before every API call
|
|
- [x] DPA in place with OpenAI
|
|
- [x] Audit logging of all AI operations
|
|
- [x] Fallback available when AI declined
|
|
- [x] Usage logs retained for 12 months only
|
|
- [x] No personal data stored at subprocessor
|
|
|
|
## Contact
|
|
|
|
For questions about AI data processing:
|
|
- Data Protection Officer: [DPO email]
|
|
- Technical Contact: [Tech contact email]
|
|
|
|
## See Also
|
|
|
|
- [Platform GDPR Compliance](./platform-gdpr-compliance.md)
|
|
- [AI System Architecture](../architecture/ai-system.md)
|
|
- [AI Services Reference](../architecture/ai-services.md)
|