llm-app-security-reviewer
Review LLM, generative AI, RAG, agent, prompt, embedding, vector database, MCP, and tool-calling application changes for security risks including prompt injection, data exfiltration, unsafe tool permissions, sensitive data leakage, retrieval boundary failures, insecure model-output trust, weak guardrails, secrets exposure, prompt/completion logging, and compliance issues. Use when asked to review AI app security, agent security, prompt safety, RAG security, model integration security, tool/function calling, vector stores, MCP servers, AI gateways, or LLM-related code.
LLM App Security Reviewer
Use this skill to review AI application changes for security, privacy, and trust-boundary failures. Stay in review mode unless the user asks for implementation.
Workflow
- Identify the AI surface: prompts, system/developer instructions, model clients, RAG retrievers, embedding pipelines, vector stores, tools/functions, agent loops, MCP servers, gateways, evals, telemetry, and policy config.
- Map trust boundaries: user input, retrieved documents, model output, tool output, tenant/member context, credentials, network/file/database access, and human approval steps.
- Read changed files first. Then run targeted searches or the bundled scanner to find additional evidence.
- If repository context is available, run:
python /path/to/llm-app-security-reviewer/scripts/scan_llm_app_risks.py . --json /tmp/llm-app-security-scan.json
- Treat scanner hits as leads, not findings. Validate each issue in context before reporting it.
- Review against the focus areas below, then lead the final response with findings ordered by severity and grounded in file/line references.
Review Focus
- Prompt injection: untrusted text can override system intent, hide instructions, change tool use, or leak hidden context.
- RAG data leakage: retrieval lacks user/tenant/role filters, sends excessive context, exposes raw source documents, or indexes sensitive data without lifecycle controls.
- Tool and function calling: model-selected tools can read, write, execute, browse, mutate state, call internal services, or access secrets without allowlists, schema validation, authorization, dry-run modes, or approval gates.
- Model-output trust: LLM output is treated as executable code, SQL, HTML, policy, entitlement, identity proof, medical/financial decisioning, or compliance evidence without deterministic validation.
- Sensitive data handling: prompts, completions, embeddings, traces, analytics, eval datasets, cache entries, and audit logs contain PHI, PII, credentials, access tokens, or internal confidential data.
- Agent control flow: loops lack budgets, termination checks, scoped memory, replay protection, tenant isolation, or clear human escalation.
- Provider and deployment config: external model endpoints, data retention settings, regional controls, model version drift, and fallback providers are not documented or governed.
- Tests and evals: no regression tests for prompt injection, retrieval isolation, tool authorization, refusal boundaries, or sensitive-data leakage.
Severity Guide
- Critical: likely cross-tenant data exposure, credential exfiltration, arbitrary code execution, unauthorized mutation, or PHI/PII disclosure to an unapproved system.
- High: exploitable prompt/tool/RAG bypass with realistic attacker control, missing authorization on AI-mediated actions, or persistent sensitive-data leakage.
- Medium: weak guardrails, incomplete validation, excessive context sharing, missing tests, or risky logging with limited exposure.
- Low: hardening gaps, documentation gaps, observability gaps, or defense-in-depth improvements.
Output
- Lead with findings first, ordered by severity.
- Include file and line references when possible.
- For each finding, explain the attack path, impact, and concrete remediation.
- Separate confirmed issues from "needs verification" items.
- If no issues are found, say so clearly and list meaningful residual risks or test gaps.
Reference Files
references/checklist.md- Detailed review checklist and grep targets.references/rag-security.md- RAG and vector-store security review guide.references/tool-calling-security.md- Tool/function/agent/MCP security review guide.references/report-template.md- Finding and no-finding report shapes.
Related Assets
OTC Awesome LLM Catalog Assistant
Use OTC AWESOME LLM tools to list, search, and download OTC assets (skills/instructions/prompts/agents) into this repo.
Owner: platform-devops
Wall-E RAG Tuning Helper
Recommend RAG chunking, embedding, and retrieval parameters for Wall-E contexts based on corpus characteristics and performance requirements.
Owner: epic-platform-sre
Wall-E Workflow Designer (Optum)
Assist with designing, reviewing, and optimizing multi-agent Wall-E workflows and MCP integrations following Optum enterprise patterns.
Owner: epic-platform-sre
Kratos Memory System Usage Guide
Guidelines for using Kratos MCP memory system for project-isolated persistent memory storage, retrieval, and full-text search in AI coding workflows.
Owner: epic-platform-sre
MCP Server Development Standards (Optum)
Standards, patterns, and guardrails for building Model Context Protocol (MCP) servers compatible with Wall-E, VS Code Copilot, and enterprise systems.
Owner: epic-platform-sre
MCP Tool Preferences
Establishes explicit preferences for MCP tools over CLI equivalents. Ensures consistent, reliable tool usage when multiple paths exist.
Owner: platform-engineering

