Skip to content

Optum Responsible AI (RAI) compliance

Responsible AI compliance requirements for Optum AI/ML development, covering AIRB submission, shadow mode pilots, RAI risk tiers, and governance processes.

experimental
IDE:
claude
codex
vscode
Version:
1.0.0
Owner:epic-platform-sre
rai
compliance
governance
optum

Optum Responsible AI (RAI) Compliance

This instruction file provides guidance for developing AI/ML solutions that comply with Optum's Responsible AI (RAI) framework and AIRB (AI Review Board) requirements.

Overview

All AI/ML applications at Optum must follow the RAI Development Guide v3.0 and obtain AIRB approval before production deployment. This includes:

  • LLM-based applications (chatbots, copilots, agents)
  • Traditional ML models (classification, regression, clustering)
  • Decision support systems with AI components
  • Automated data processing with AI/ML

RAI Risk Tiers

Tier 1: Low Risk

  • Definition: No direct impact on member care or business decisions
  • Examples: Internal productivity tools, code generation assistants, documentation generators
  • Review: Lightweight AIRB review, self-certification allowed
  • Timeline: 1-2 weeks

Tier 2: Medium Risk

  • Definition: Indirect impact on operations or member experience
  • Examples: Care coordination assistants, claims processing aids, provider search enhancements
  • Review: Standard AIRB review with testing requirements
  • Timeline: 4-6 weeks

Tier 3: High Risk

  • Definition: Direct impact on member health, coverage, or financial decisions
  • Examples: Clinical decision support, coverage determination, fraud detection
  • Review: Full AIRB review with ongoing monitoring
  • Timeline: 8-12 weeks

AIRB Submission Process

1. Pre-submission checklist

Before submitting to AIRB:

# Include this metadata in your project
airb_submission:
  risk_tier: 'tier-2' # tier-1 | tier-2 | tier-3
  use_case: 'Terraform plan risk analysis for infrastructure changes'
  data_types:
    - configuration_files
    - infrastructure_state
  phi_handling: 'none' # none | de-identified | full
  pii_risk: 'low' # low | medium | high
  decision_type: 'advisory' # advisory | automated | human-in-loop
  shadow_mode_eligible: true

2. Required documentation

Technical documentation:

  • System architecture diagram
  • Data flow diagram (inputs, processing, outputs)
  • Model card (for ML models) or LLM specification
  • Integration points and dependencies
  • Failure modes and fallback mechanisms

Governance documentation:

  • Privacy Impact Assessment (PIA)
  • Bias and fairness analysis
  • Transparency and explainability plan
  • Monitoring and alerting strategy
  • Incident response plan

3. Submit via UAIS

# Submit through United AI Studio portal
https://app.unitedaistudio.uhg.com/projects

# Or via API (if available)
curl -X POST https://api.unitedaistudio.uhg.com/airb/submit \
  -H "Authorization: Bearer $UAIS_TOKEN" \
  -d @airb-submission.json

4. AIRB review timeline

PhaseDurationActivities
Intake1-2 daysInitial review, risk tier confirmation
Technical review1-3 weeksArchitecture, security, privacy assessment
Bias & fairness1-2 weeksFairness testing, bias mitigation review
Final approval3-5 daysExecutive review, decision

Shadow Mode Pilot

For Tier 2 and Tier 3 applications, run a shadow mode pilot before full deployment:

Shadow mode requirements

# Shadow mode implementation pattern
class AIAssistant:
    def __init__(self, shadow_mode: bool = False):
        self.shadow_mode = shadow_mode
        self.logger = get_airb_logger()

    def make_recommendation(self, input_data):
        # Generate AI recommendation
        ai_result = self.model.predict(input_data)

        if self.shadow_mode:
            # Log recommendation but don't use it
            self.logger.log_shadow_prediction(
                input=input_data,
                prediction=ai_result,
                timestamp=datetime.utcnow()
            )
            # Return None or default value
            return None
        else:
            # Use AI recommendation in production
            return ai_result

Shadow mode duration

  • Tier 2: 30 days minimum, 1000+ predictions
  • Tier 3: 90 days minimum, 5000+ predictions

Success criteria

  • Accuracy within 5% of baseline
  • No bias detected in protected groups
  • Explainability score > 0.7
  • Incident count = 0

Bias Detection and Mitigation

Protected attributes

Monitor fairness across these dimensions:

PROTECTED_ATTRIBUTES = [
    "age",
    "gender",
    "race",
    "ethnicity",
    "disability_status",
    "language",
    "geography",  # Rural vs urban
    "socioeconomic_status"
]

Fairness metrics

from optum.rai import FairnessAnalyzer

analyzer = FairnessAnalyzer(
    model=my_model,
    protected_attributes=PROTECTED_ATTRIBUTES
)

# Calculate fairness metrics
results = analyzer.analyze(test_data)

# Must meet these thresholds
assert results.demographic_parity < 0.1  # <10% disparity
assert results.equal_opportunity > 0.9   # >90% equal opportunity
assert results.disparate_impact > 0.8    # >80% DI ratio

Mitigation strategies

  1. Pre-processing: Reweight training data to balance protected groups
  2. In-processing: Use fairness constraints during training
  3. Post-processing: Adjust decision thresholds per group
  4. Human-in-loop: Require human review for borderline cases

Privacy and Security

Data handling rules

# Do NOT log or store
PROHIBITED_DATA = [
    "member_name",
    "social_security_number",
    "date_of_birth",
    "address",
    "phone_number",
    "email",
    "medical_record_number"
]

# De-identify before logging
def log_inference(input_data, output_data):
    deidentified = deidentify(input_data, PROHIBITED_DATA)
    logger.info(f"Inference: {deidentified} -> {output_data}")

Encryption requirements

  • At rest: All model artifacts and training data must be encrypted (AES-256)
  • In transit: TLS 1.3 for all API calls
  • In memory: Use secure enclaves for sensitive inference

Access control

# Role-based access control
rbac:
  model_developer:
    - read_training_data
    - write_model
    - deploy_to_dev

  data_scientist:
    - read_training_data
    - write_model

  production_deployer:
    - deploy_to_prod
    - manage_monitoring

  auditor:
    - read_logs
    - read_metrics

Transparency and Explainability

Model cards

Every model must have a model card:

# Model Card: Terraform Risk Analyzer

## Model Details

- **Model type**: GPT-4-based risk analysis agent
- **Training data**: 10,000 anonymized Terraform plans
- **Version**: 1.2.0
- **Last updated**: 2025-12-11

## Intended Use

- **Primary use**: Identify high-risk changes in Terraform plans
- **Out of scope**: Automated approval/rejection of changes
- **Target users**: Platform engineers, SREs

## Performance

- **Accuracy**: 94% on validation set
- **Precision**: 92%
- **Recall**: 91%
- **F1 Score**: 0.915

## Fairness

- No protected attributes in scope
- Geographic analysis: No significant regional bias

## Limitations

- May miss novel attack patterns not in training data
- Requires human review for high-risk changes
- Sensitive to input formatting

Explainability in code

# Provide explanations for all AI decisions
def explain_decision(input_data, model_output):
    """
    Generate human-readable explanation for AI decision.

    Required for AIRB compliance.
    """
    # Use SHAP, LIME, or attention weights
    explanation = model.explain(input_data)

    return {
        "decision": model_output,
        "confidence": explanation.confidence,
        "top_factors": explanation.top_factors[:5],
        "counterfactuals": explanation.counterfactuals,
        "human_readable": f"Decision based on {explanation.summary}"
    }

Monitoring and Alerting

Required metrics

# Monitor these metrics in production
REQUIRED_METRICS = {
    "inference_latency_p95": 500,  # ms
    "error_rate": 0.01,             # 1%
    "bias_drift": 0.05,             # 5% max drift
    "accuracy_drift": 0.05,         # 5% max drift
    "explainability_score": 0.7     # >70%
}

# Alert thresholds
ALERT_THRESHOLDS = {
    "critical": {
        "error_rate": 0.05,         # 5%
        "bias_drift": 0.10,         # 10%
    },
    "warning": {
        "inference_latency_p95": 1000,  # ms
        "accuracy_drift": 0.08,         # 8%
    }
}

Incident response

When RAI violations are detected:

  1. Immediate: Trigger kill switch via Agent Gateway
  2. Within 1 hour: Notify AIRB and product owner
  3. Within 4 hours: Root cause analysis
  4. Within 24 hours: Remediation plan
  5. Within 1 week: Post-mortem and prevention measures

Code Examples

Compliant LLM application

from optum.rai import RAIFramework
from optum.monitoring import AIMonitor

class CompliantLLMApp:
    def __init__(self):
        self.rai = RAIFramework(
            risk_tier="tier-2",
            airb_ticket="AIRB-2025-1234"
        )
        self.monitor = AIMonitor(app_name="terraform-assistant")

    def process_request(self, user_input):
        # 1. Validate input
        if not self.rai.validate_input(user_input):
            return {"error": "Invalid input"}

        # 2. Check for PII/PHI
        if self.rai.contains_sensitive_data(user_input):
            return {"error": "Sensitive data detected"}

        # 3. Generate response
        response = self.llm.generate(user_input)

        # 4. Explain decision
        explanation = self.explain(user_input, response)

        # 5. Log for monitoring
        self.monitor.log_inference(
            input=self.rai.deidentify(user_input),
            output=response,
            explanation=explanation
        )

        # 6. Return with explanation
        return {
            "response": response,
            "explanation": explanation,
            "confidence": explanation.confidence
        }

Testing for bias

import pytest
from optum.rai.testing import BiasTestSuite

class TestModelFairness:
    def test_demographic_parity(self):
        """Ensure model treats all demographic groups fairly."""
        suite = BiasTestSuite(model=my_model)
        results = suite.test_demographic_parity(
            test_data,
            protected_attr="age"
        )
        assert results.disparity < 0.1

    def test_equal_opportunity(self):
        """Ensure equal true positive rates across groups."""
        suite = BiasTestSuite(model=my_model)
        results = suite.test_equal_opportunity(
            test_data,
            protected_attr="gender"
        )
        assert results.tpr_disparity < 0.05

Resources

Internal documentation

Training and support

Tools and libraries

Compliance checklist

Before deploying to production:

  • AIRB ticket created and approved
  • Risk tier assessment completed
  • PIA (Privacy Impact Assessment) submitted
  • Bias and fairness testing completed
  • Model card published
  • Monitoring and alerting configured
  • Incident response plan documented
  • Shadow mode pilot completed (if required)
  • Security review passed
  • Kill switch integrated via Agent Gateway
  • Documentation published to UAIS
  • Training provided to end users

Version history

  • v3.0 (2025-12-11): Initial instruction file based on RAI Development Guide v3.0

Related Assets